> ## Documentation Index > Fetch the complete documentation index at: https://arize-ax.mintlify.site/docs/llms.txt > Use this file to discover all available pages before exploring further. # Evaluate > Execute code and evaluate LLM performance with precision Hands-on guides for the **Evaluate** stage of the AX workflow: building evaluators, aligning them with human judgment, and measuring quality. Get started running evaluations to measure how your model performs. Iteratively refine a custom LLM-as-a-Judge evaluator against human-annotated ground truth. Build your own eval harness instead of trusting public benchmarks, via an email-extraction service. Run trace-level evaluations on individual requests to a recommendation agent. Run multi-dimensional session-level evaluations on multi-turn AI tutor conversations. Create and evaluate a RAG application to improve retrieval quality and correctness. Debug RAG retrieval quality with embeddings and LLM-assisted metrics. Build and evaluate an agentic RAG application on a Couchbase vector store. Monitor and debug a LlamaIndex RAG-powered chatbot with traces and spans. Create and evaluate a math problem-solving agent using Ragas and Arize AX. Evaluate a question-answering task with Pydantic Evals and log results to Arize AX. Trace OpenAI Realtime voice agents and run tone evaluation on captured audio. Transcribe and evaluate audio with Gemini Flash, traced in Arize AX. Span-level evaluator examples for hallucination, relevance, toxicity, SQL, tool calling, and more.