> ## Documentation Index > Fetch the complete documentation index at: https://arize-ax.mintlify.site/docs/llms.txt > Use this file to discover all available pages before exploring further. # Overview > Guides & tutorials to help you build with Arize AX ## Instrument Capture traces and spans from your LLM and agent applications. Enrich auto-instrumented traces with LLM, tool, agent, chain, and session attributes. Use Arize integrations to automatically collect LLM traces. Create and evaluate a custom support agent with Arize AX to improve performance. Create and evaluate agents with the OpenAI Agents SDK in Arize AX. Scaffold a Vercel Eve agent and add Arize AX observability through OpenTelemetry. Split-stream OpenTelemetry traces into both Arize AX and Databricks Unity Catalog. ## Observe Monitor your applications in production and surface high-signal issues. Run online evals and monitor a tool-calling LangGraph agent in production. Decide what to guard at input vs. output and layer guardrails without blocking real users. ## Evaluate Build evaluators, align them with human judgment, and measure quality. Get started running evaluations to measure how your model performs. Iteratively refine a custom LLM-as-a-Judge evaluator against human-annotated ground truth. Build your own eval harness instead of trusting public benchmarks, via an email-extraction service. Run trace-level evaluations on individual requests to a recommendation agent. Run multi-dimensional session-level evaluations on multi-turn AI tutor conversations. Create and evaluate a RAG application to improve retrieval quality and correctness. Debug RAG retrieval quality with embeddings and LLM-assisted metrics. Build and evaluate an agentic RAG application on a Couchbase vector store. Monitor and debug a LlamaIndex RAG-powered chatbot with traces and spans. Create and evaluate a math problem-solving agent using Ragas and Arize AX. Evaluate a question-answering task with Pydantic Evals and log results to Arize AX. Trace OpenAI Realtime voice agents and run tone evaluation on captured audio. Transcribe and evaluate audio with Gemini Flash, traced in Arize AX. Span-level evaluator examples for hallucination, relevance, toxicity, SQL, tool calling, and more. ## Improve Run experiments, optimize prompts, and add guardrails. An end-to-end walkthrough of the prompt iteration cycle using a trip-planner use case. Experiment with prompts to optimize a summarization task. Build and optimize a Text2SQL application for database querying from scratch. Use Prompt Learning to improve accuracy on structured output generation. Optimize coding agent prompts for the planning phase with Prompt Learning. Optimize coding agent prompts for execution and track improvement. Use Prompt Learning to improve your LLM evaluation prompts. Add realtime guardrails so production LLM apps output safe responses. ## Advanced Workflows End-to-end guides for complex multi-agent, multi-modal, and security-focused systems. Build and deploy a LangGraph product-recommendation agent on Vertex AI Agent Engine. Build a multi-agent trading system with Google ADK, the A2A protocol, MCP, and Llama. Build and trace a multi-modal autonomous browser agent powered by Llama 4. Trace a LangChain agent and run Microsoft Foundry risk and safety evaluators. Trace Microsoft Foundry Red Teaming Agent scans against your LLM or agent. Red-team an assistant across an attack taxonomy, score Attack Success Rate, and find which defenses work. Advanced experiments and benchmarks in LLM evaluation, instrumentation, and agent systems.