

LLM-as-a-Judge 101
Curious about AI evals, but not sure where to start? In this hands-on, beginner-friendly session, we walk you through the core building blocks of LLM-as-a-judge evaluations.
You’ll learn how to design your first evaluation from scratch, including:
What to measure: Understand the key qualities of a good metric and identify the specific criteria that will provide the most actionable insights into your application.
Which model to use: Learn how to choose the right judge model for your needs—whether you're optimizing for cost and speed or maximum quality.
How to prompt effectively: See examples of prompt formats that yield consistent, interpretable results, with tips on avoiding common pitfalls.
How to improve your eval: Learn how to perform meta-evaluation, conduct error analysis, and iteratively refine your prompts for stronger insights.
This session is led by industry experts who have hands-on experience evaluating real-world AI applications and are deeply familiar with the latest research. You'll walk away with practical guidelines and a clear mental model for how to structure evaluations.