What this session solves
Most eval projects fail because the team jumps to a metric before agreeing on what good means. This session starts from the product problem and turns it into concrete eval criteria.
The case is a Q&A agent for help-center search, with no production traffic yet. You will learn how to build an initial dataset and make the first baseline useful.