Session 5

Roll evals into production

Make evals part of release management, not a one-off analysis that disappears after launch.

CaseLaunch readiness
FormatOperating model
OutputRelease workflow

What this session solves

Evals only matter if they change decisions. This session turns eval work into launch gates, weekly review rituals, and reporting that product, engineering, legal, and leadership can use.

The case is an AI feature heading to production with stakeholder pressure, unclear risk, and a need for a repeatable release process.

Agenda

  1. Define what must be true before launch.
  2. Decide who owns datasets, metrics, and regressions.
  3. Report eval results without hiding risk.
  4. Keep quality work alive after release.