What this session solves
Many chat products pass single-turn tests and fail in real conversations. This session shows how to evaluate the full path from user intent to final task completion.
The case is a text-to-SQL assistant with access to user-specific data, where incorrect context or a bad follow-up can break the workflow.