Skip to main content
The datasets client methods are currently in BETA. The API may change without notice. A one-time warning is emitted on first use. The update method is in ALPHA.
Create versioned datasets for experimentation, evaluation, and fine-tuning. Datasets are version-controlled collections of examples. Updates modify the current version in-place.

Key Capabilities

  • Create datasets from Python dicts or pandas DataFrames
  • Append examples in-place to existing dataset versions
  • Efficient bulk operations via Arrow Flight for large datasets
  • Cache datasets locally for faster experiment iteration

List Datasets

List all datasets with optional filtering by space or name.
resp = client.datasets.list(
    space="your-space-name-or-id",  # optional
    name="my-dataset",              # optional substring filter
    limit=50,
)

for dataset in resp.datasets:
    print(dataset.id, dataset.name)
For details on pagination, field introspection, and data conversion (to dict/JSON/DataFrame), see Response Objects.

Create a Dataset

Create a new dataset with examples for evaluation or experimentation.
examples = [
    {
        "query": "What is the capital of France?",
        "expected_output": "Paris",
        "eval.Correctness.label": "correct",
    },
    {
        "query": "Who wrote Romeo and Juliet?",
        "expected_output": "William Shakespeare",
        "eval.Correctness.label": "correct",
    },
]

dataset = client.datasets.create(
    space="your-space-name-or-id",
    name="my-test-dataset",
    examples=examples,
)

Get a Dataset

Retrieve a specific dataset by name or ID. When using a name, provide space to disambiguate.
dataset = client.datasets.get(
    dataset="dataset-name-or-id",
    space="your-space-name-or-id",  # required when using a name
)

print(dataset)

Delete a Dataset

Delete a dataset by name or ID. This operation is irreversible. There is no response from this call.
client.datasets.delete(
    dataset="dataset-name-or-id",
    space="your-space-name-or-id",  # required when using a name
)

print("Dataset deleted successfully")

Rename a Dataset

Rename a dataset. The new name must be unique within the space.
dataset = client.datasets.update(
    dataset="dataset-name-or-id",
    space="your-space-name-or-id",  # required when using a name
    name="renamed-dataset",
)

print(dataset.name)

List Dataset Examples

Retrieve examples from a dataset with pagination support. Pass all=True to fetch all examples via Flight (ignores limit).
resp = client.datasets.list_examples(
    dataset="dataset-name-or-id",
    space="your-space-name-or-id",  # required when using a name
    limit=100,
)

for example in resp.examples:
    print(example)
For details on pagination, field introspection, and data conversion (to dict/JSON/DataFrame), see Response Objects.

Append Dataset Examples

Add new examples to an existing dataset. Examples are appended in-place to the latest dataset version by default — this does not create a new version. You can target a specific version by passing dataset_version_id. The response includes the dataset version the examples were written to (dataset_version_id) and the server-generated IDs of the inserted examples (example_ids).
new_examples = [
    {
        "query": "What is machine learning?",
        "expected_output": "A subset of AI focused on learning from data",
        "eval.Correctness.label": "correct",
    },
    {
        "query": "Who invented Python?",
        "expected_output": "Guido van Rossum",
        "eval.Correctness.label": "correct",
    },
]

result = client.datasets.append_examples(
    dataset="dataset-name-or-id",
    space="your-space-name-or-id",  # required when using a name
    examples=new_examples,
)

print(result.dataset_version_id, result.example_ids)
Note: Do not include system-managed fields (id, created_at, updated_at) in your examples. These are automatically generated by the server.

Annotate Dataset Examples

Write human annotations to a batch of examples in a dataset. Annotations are upserted by annotation config name for each example; submitting the same name for the same example overwrites the previous value. Up to 1000 examples may be annotated per request. This method returns None on success.
from arize.datasets.types import AnnotateRecordInput, AnnotationInput

client.datasets.annotate_examples(
    dataset="your-dataset-name-or-id",
    space="your-space-name-or-id",  # required when using a name
    annotations=[
        AnnotateRecordInput(
            record_id="your-example-id",
            values=[
                AnnotationInput(name="quality", score=0.9),
                AnnotationInput(name="topic", label="science"),
            ],
        ),
    ],
)
Learn more: Datasets Documentation