Session 3

Evaluate image generation

Separate safety, brand fit, and visual quality so subjective review becomes a repeatable process.

CasePayment card images
FormatRubric design
OutputVisual judge protocol

What this session solves

Image evals get messy when policy, taste, and technical defects are mixed together. This session breaks the problem into clear checks that reviewers and automated judges can run consistently.

The case is custom image generation for payment cards, where safety issues and brand quality both matter before launch.

Agenda

  1. Split policy checks from quality checks.
  2. Define observable failure modes for generated images.
  3. Design a VLM judge that does not overreach.
  4. Escalate ambiguous cases to human review.