Session 1

Define quality for a Q&A agent

Turn product requirements into eval criteria before you pick metrics or start tuning prompts.

CaseHelp-center Q&A
FormatLive workshop
OutputFirst eval baseline

What this session solves

Most eval projects fail because the team jumps to a metric before agreeing on what good means. This session starts from the product problem and turns it into concrete eval criteria.

The case is a Q&A agent for help-center search, with no production traffic yet. You will learn how to build an initial dataset and make the first baseline useful.

Agenda

  1. Map user intent, business risk, and answer quality.
  2. Write eval criteria that engineers and PMs can both use.
  3. Create synthetic examples when there is no live data.
  4. Run a first baseline and decide what to fix next.