The
experiments client methods are currently in BETA. The API may change without notice. A one-time warning is emitted on first use.Key Capabilities
- Automatic tracing of all LLM calls during experiments
- Concurrent execution for faster evaluation
- Dry-run mode for testing without logging
- Built-in evaluator support
- Compare experiments side-by-side in the UI
List Experiments
List all experiments, optionally filtered by dataset or space.Create an Experiment
Log pre-computed experiment results to Arize. Use this when you’ve already executed your experiment elsewhere and want to record the results. Unlikerun(), this does not execute the task - it only logs existing results.
Get an Experiment
Retrieve experiment details and metadata by name or ID. When using a name, providedataset and optionally space to disambiguate.
Delete an Experiment
Delete an experiment by name or ID. This operation is irreversible. There is no response from this call.Run an Experiment
Execute a task function across your dataset examples with automatic evaluation, then log the results to Arize. High-level flow:- Resolve the dataset and download examples (cached if enabled)
- Execute the task and evaluators with configurable concurrency
- Upload results to Arize (unless in dry-run mode)
Dry Run Mode
Execute your experiment locally without logging results to Arize. Use this to test your task and evaluators before committing to a full run.Concurrency Control
Control parallelism for faster execution.Error Handling
Stop execution on the first error encountered.OpenTelemetry Tracing
Set the global OpenTelemetry tracer provider for the experiment run.List Experiment Runs
Retrieve individual runs from an experiment with pagination support. Passall=True to fetch all runs via Flight (ignores limit).
Append Experiment Runs
Append new runs to an existing experiment. Runs are inserted in input order. Provide between 1 and 1000 runs per request. Each run must includeexample_id (an existing dataset example) and output; additional user-defined fields (e.g. latency_ms, model) are allowed.
Annotate Experiment Runs
Write human annotations to a batch of runs in an experiment. Annotations are upserted by annotation config name for each run; submitting the same name for the same run overwrites the previous value. Up to 1000 runs may be annotated per request. This method returnsNone on success.