LLM-as-a-Judge 101

Arize AI

Zoom

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Curious about AI evals, but not sure where to start? In this hands-on, beginner-friendly session, we walk you through the core building blocks of LLM-as-a-judge evaluations.
You’ll learn how to design your first evaluation from scratch, including:

What to measure: Understand the key qualities of a good metric and identify the specific criteria that will provide the most actionable insights into your application.
Which model to use: Learn how to choose the right judge model for your needs—whether you're optimizing for cost and speed or maximum quality.
How to prompt effectively: See examples of prompt formats that yield consistent, interpretable results, with tips on avoiding common pitfalls.
How to improve your eval: Learn how to perform meta-evaluation, conduct error analysis, and iteratively refine your prompts for stronger insights.

This session is led by industry experts who have hands-on experience evaluating real-world AI applications and are deeply familiar with the latest research. You'll walk away with practical guidelines and a clear mental model for how to structure evaluations.

Presented by

Arize AI

Generative AI-focused workshops, hackathons, and more. Come build with us!

Hosted By

46 Going

AI