> ## Documentation Index
> Fetch the complete documentation index at: https://arize-ax.mintlify.site/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Evaluate

> Execute code and evaluate LLM performance with precision

Hands-on guides for the **Evaluate** stage of the AX workflow: building evaluators, aligning them with human judgment, and measuring quality.

<CardGroup cols={2}>
  <Card title="Evaluations Quickstart" href="/ax/cookbooks/evaluate/evaluations-quickstart">Get started running evaluations to measure how your model performs.</Card>
  <Card title="Align LLM Evals with Human Judgment" href="/ax/cookbooks/evaluate/align-llm-evals-with-human-judgment">Iteratively refine a custom LLM-as-a-Judge evaluator against human-annotated ground truth.</Card>
  <Card title="Why Public Benchmarks Lie: Building Your Own Eval Harness" href="/ax/cookbooks/evaluate/model-comparison-for-an-email-text-extraction-service">Build your own eval harness instead of trusting public benchmarks, via an email-extraction service.</Card>
  <Card title="Trace-Level Evaluations for a Recommendation Agent" href="/ax/cookbooks/evaluate/trace-level-evaluations-for-a-recommendation-agent">Run trace-level evaluations on individual requests to a recommendation agent.</Card>
  <Card title="Session-Level Evaluations for an AI Tutor" href="/ax/cookbooks/evaluate/session-level-evaluations-for-an-ai-tutor">Run multi-dimensional session-level evaluations on multi-turn AI tutor conversations.</Card>
  <Card title="Evaluating RAG Retrieval Quality and Correctness" href="/ax/cookbooks/evaluate/evaluating-rag">Create and evaluate a RAG application to improve retrieval quality and correctness.</Card>
  <Card title="Retrieval Evaluation" href="/ax/cookbooks/evaluate/retrieval-evaluation">Debug RAG retrieval quality with embeddings and LLM-assisted metrics.</Card>
  <Card title="Evaluating Agentic RAG Using Arize AX and Couchbase" href="/ax/cookbooks/evaluate/evaluating-agentic-rag-using-arize-and-couchbase">Build and evaluate an agentic RAG application on a Couchbase vector store.</Card>
  <Card title="Evaluating a RAG-Powered Chatbot" href="/ax/cookbooks/evaluate/llamaindex-evals">Monitor and debug a LlamaIndex RAG-powered chatbot with traces and spans.</Card>
  <Card title="Evaluate a Math Problem-Solving Agent Using Ragas" href="/ax/cookbooks/evaluate/ragas-agents-cookbook">Create and evaluate a math problem-solving agent using Ragas and Arize AX.</Card>
  <Card title="Pydantic Evals" href="/ax/cookbooks/evaluate/pydantic-evals">Evaluate a question-answering task with Pydantic Evals and log results to Arize AX.</Card>
  <Card title="Tracing and Evaluating Voice Applications" href="/ax/cookbooks/evaluate/tracing-and-evaluating-audio">Trace OpenAI Realtime voice agents and run tone evaluation on captured audio.</Card>
  <Card title="Audio Transcription and Evaluation with Gemini Flash" href="/ax/cookbooks/evaluate/gemini-audio-evals">Transcribe and evaluate audio with Gemini Flash, traced in Arize AX.</Card>
  <Card title="More Guides" href="/ax/cookbooks/evaluate/evaluation">Span-level evaluator examples for hallucination, relevance, toxicity, SQL, tool calling, and more.</Card>
</CardGroup>
