What this session solves
A single score does not tell you what to ship. This session shows how to read traces, group failures, and connect each failure type to a product or engineering decision.
The case is a RAG agent with noisy traces and unclear failure modes. You will separate retrieval problems, reasoning problems, policy misses, and UX gaps.