The spans.list and spans.annotate methods are currently in BETA and spans.delete is in ALPHA. Their APIs may change without notice and a one-time warning is emitted on first use. The remaining methods (log, update_evaluations, update_annotations, update_metadata, export_to_df, export_to_parquet) are stable.
Log, query, and update LLM traces programmatically. Upload bulk traces or update evaluations and annotations after the fact.
Key Capabilities
- List and filter spans for a project
- Bulk upload traces from offline processing
- Update evaluations asynchronously (LLM-as-judge patterns)
- Annotate spans by ID or attach annotations to traces in bulk
- Attach custom metadata for filtering and analysis
- Export spans for offline analysis
- Permanently delete spans by ID
List Spans
For downloading large volumes of spans, use export_to_df instead.
List spans for a project within an optional time window. Spans are returned in descending start-time order (most recent first). If start_time and end_time are not provided, the last seven days are queried.
from datetime import datetime
resp = client.spans.list(
project="your-project-name-or-id",
start_time=datetime(2024, 1, 1), # optional
end_time=datetime(2024, 2, 1), # optional
limit=100,
)
for span in resp.spans:
print(span.span_id, span.name)
Filter Spans
Use the filter parameter to narrow results by status, evaluation labels, annotation labels, or latency:
# Filter by status
resp = client.spans.list(
project="your-project-name-or-id",
filter="status_code = 'ERROR'",
)
# Filter by evaluation label
resp = client.spans.list(
project="your-project-name-or-id",
filter="eval.Correctness.label = 'correct'",
)
# Filter by annotation label
resp = client.spans.list(
project="your-project-name-or-id",
filter="annotation.Quality.label = 'good'",
)
# Filter by latency
resp = client.spans.list(
project="your-project-name-or-id",
filter="latency_ms > 1000",
)
# Combine filters with AND / OR
resp = client.spans.list(
project="your-project-name-or-id",
filter="status_code = 'ERROR' AND eval.Correctness.label = 'correct'",
)
For details on pagination, field introspection, and data conversion (to dict/JSON/DataFrame), see Response Objects.
Log Spans
Upload traces in bulk from offline processing or batch evaluation.
import pandas as pd
# Prepare spans DataFrame
spans_df = pd.DataFrame([
{
"context.span_id": "span-1",
"context.trace_id": "trace-1",
"name": "llm_call",
"span_kind": "LLM",
"start_time": "2024-01-15T10:00:00Z",
"end_time": "2024-01-15T10:00:02Z",
"attributes.llm.model_name": "gpt-4",
"attributes.llm.input_messages": [...],
"attributes.llm.output_messages": [...],
},
])
# Optional: include evaluations
evals_df = pd.DataFrame([
{
"context.span_id": "span-1",
"eval.Correctness.label": "correct",
"eval.Correctness.score": 1.0,
"eval.Correctness.explanation": "The model's response was accurate.",
},
])
# Log spans
response = client.spans.log(
space_id="your-space-id",
project_name="my-llm-app",
dataframe=spans_df,
evals_dataframe=evals_df, # Optional
)
print(f"Logged spans successfully: {response.status_code}")
Log Spans Only
client.spans.log(
space_id="your-space-id",
project_name="my-llm-app",
dataframe=spans_df,
)
Update Evaluations
Add or update evaluations for existing spans (useful for LLM-as-judge patterns).
import pandas as pd
evals_df = pd.DataFrame([
{
"context.span_id": "span-1",
"eval.Relevance.label": "relevant",
"eval.Relevance.score": 0.95,
"eval.Relevance.explanation": "The response directly answers the question.",
},
{
"context.span_id": "span-2",
"eval.Relevance.label": "not_relevant",
"eval.Relevance.score": 0.2,
"eval.Relevance.explanation": "The model's response was not relevant.",
},
])
response = client.spans.update_evaluations(
space_id="your-space-id",
project_name="my-llm-app",
dataframe=evals_df,
)
print("Updated evaluations successfully")
Batch Evaluation Pattern
# Run async LLM evaluations on existing traces
async def evaluate_traces():
# Fetch traces to evaluate
traces = fetch_recent_traces()
# Run LLM-as-judge evaluations
eval_results = []
for trace in traces:
score = await llm_judge.evaluate(trace)
eval_results.append({
"context.span_id": trace.span_id,
"name": "Quality",
"score": score,
})
# Upload evaluations
evals_df = pd.DataFrame(eval_results)
client.spans.update_evaluations(
space_id="your-space-id",
project_name="my-llm-app",
dataframe=evals_df,
)
Annotate Spans
Write human annotations to a batch of spans by ID. Annotations are upserted by annotation config name for each span; submitting the same name for the same span overwrites the previous value. Up to 1000 spans may be annotated per request. Spans are looked up within the specified time window (defaulting to the last 31 days). If any span ID in the batch is not found within the window, the entire request is rejected with a 404 error.
from datetime import datetime
from arize.spans.types import AnnotateRecordInput
from arize.annotation_queues.types import AnnotationInput
client.spans.annotate(
project="your-project-name-or-id",
space="your-space-name-or-id", # required when project is a name
annotations=[
AnnotateRecordInput(
record_id="your-span-id",
values=[
AnnotationInput(name="accuracy", label="correct", score=1.0),
AnnotationInput(name="notes", text="Verified by reviewer"),
],
),
],
start_time=datetime(2026, 4, 1), # optional, defaults to 31 days ago
end_time=datetime(2026, 5, 1), # optional, defaults to now
)
Update Annotations
Add human feedback and annotations to spans.
import pandas as pd
annotations_df = pd.DataFrame([
{
"context.span_id": "span-1",
"annotation.Quality.label": "correct",
"annotation.Quality.score": 1.0,
"annotation.Quality.text": "Verified by human reviewer",
},
])
response = client.spans.update_annotations(
space_id="your-space-id",
project_name="my-llm-app",
dataframe=annotations_df,
)
print("Updated annotations successfully")
Attach or patch custom metadata on existing spans for filtering and analysis. The method uses JSON Merge Patch semantics and supports three input approaches.
Method 1: Direct Field Columns
Set individual metadata fields using attributes.metadata.<field> column names. This is the simplest approach.
import pandas as pd
metadata_df = pd.DataFrame([
{
"context.span_id": "span-1",
"attributes.metadata.customer_id": "cust-456",
"attributes.metadata.experiment_version": "v2",
"attributes.metadata.region": "us-west",
},
{
"context.span_id": "span-2",
"attributes.metadata.customer_id": "cust-789",
"attributes.metadata.region": "eu-central",
},
])
response = client.spans.update_metadata(
space_id="your-space-id",
project_name="my-llm-app",
dataframe=metadata_df,
)
print(f"Updated: {response['spans_updated']}, Failed: {response['spans_failed']}")
Method 2: Patch Document Column
Provide a JSON patch document per span for more control. The patch is applied after any field columns. The default column name is "patch_document".
metadata_df = pd.DataFrame([
{
"context.span_id": "span-1",
"patch_document": {"tag": "important", "priority": "high"},
},
{
"context.span_id": "span-2",
"patch_document": {"tag": "standard"},
},
])
response = client.spans.update_metadata(
space_id="your-space-id",
project_name="my-llm-app",
dataframe=metadata_df,
)
Use a custom column name with the patch_document_column_name parameter:
response = client.spans.update_metadata(
space_id="your-space-id",
project_name="my-llm-app",
dataframe=metadata_df,
patch_document_column_name="my_patch_col",
)
Method 3: Combined Approach
Use both field columns and a patch document. The patch document is applied last and overrides any conflicting field column values.
metadata_df = pd.DataFrame([
{
"context.span_id": "span-1",
"attributes.metadata.tag": "important",
"patch_document": {"priority": "high"}, # Applied after field columns
},
])
response = client.spans.update_metadata(
space_id="your-space-id",
project_name="my-llm-app",
dataframe=metadata_df,
)
Type Handling
| Python type | Stored as |
|---|
str | string |
int / float | number |
bool | string ("True" / "False") |
None | JSON null (field is set to null, not removed) |
dict / list | JSON string |
Setting a field to None stores JSON null — it does not remove the field. This differs from standard JSON Merge Patch behavior.
Response Structure
update_metadata returns a dictionary with the following keys:
| Key | Description |
|---|
spans_processed | Total spans in the input DataFrame |
spans_updated | Spans successfully updated |
spans_failed | Spans that failed to update |
errors | List of {"span_id": ..., "error_message": ...} for each failure |
response = client.spans.update_metadata(...)
print(f"Processed: {response['spans_processed']}")
print(f"Updated: {response['spans_updated']}")
print(f"Failed: {response['spans_failed']}")
for err in response.get("errors", []):
print(f" span {err['span_id']}: {err['error_message']}")
Delete Spans
Permanently delete spans by their IDs. This operation is irreversible. Only spans within the 2-year lookback window are considered; older spans are not affected. Span IDs not found within the lookback window are returned in not_deleted_span_ids.
Returns a SpanDeleteResponse with:
completed — True if no retry is needed.
deleted_span_ids — IDs confirmed deleted.
not_deleted_span_ids — IDs not deleted (either not found in the lookback window, or not reached when completed is False).
When completed is False, retry the original full request to complete the deletion.
result = client.spans.delete(
project="your-project-name-or-id",
span_ids=["span-id-1", "span-id-2"],
space="your-space-name-or-id", # required when project is a name
)
if result.completed:
print(f"Deleted {len(result.deleted_span_ids)} spans")
else:
print("Deletion incomplete — retry the original request to finish")
Export Spans
Export spans for offline analysis, custom processing, or archival.
from datetime import datetime
start_time = datetime.strptime("2024-01-01", "%Y-%m-%d")
end_time = datetime.strptime("2026-01-01", "%Y-%m-%d")
# Export to DataFrame
df = client.spans.export_to_df(
space_id="your-space-id",
project_name="my-llm-app",
start_time=start_time,
end_time=end_time,
)
print(f"Exported {len(df)} spans")
Export to Parquet
client.spans.export_to_parquet(
space_id="your-space-id",
project_name="my-llm-app",
start_time=start_time,
end_time=end_time,
path="./spans_export.parquet",
)
Export capabilities:
- Time-range filtering
- DataFrame or Parquet output
- Efficient Arrow Flight transport for large exports
- Progress bars for long-running exports