Create versioned datasets for experimentation, evaluation, and fine-tuning. Supports Python dicts and pandas DataFrames.
The datasets client methods are currently in BETA. The API may change without notice. A one-time warning is emitted on first use. The update method is in ALPHA.
Create versioned datasets for experimentation, evaluation, and fine-tuning. Datasets are version-controlled collections of examples. Updates modify the current version in-place.
Delete a dataset by name or ID. This operation is irreversible. There is no response from this call.
client.datasets.delete( dataset="dataset-name-or-id", space="your-space-name-or-id", # required when using a name)print("Dataset deleted successfully")
Rename a dataset. The new name must be unique within the space.
dataset = client.datasets.update( dataset="dataset-name-or-id", space="your-space-name-or-id", # required when using a name name="renamed-dataset",)print(dataset.name)
Retrieve examples from a dataset with pagination support. Pass all=True to fetch all examples via Flight (ignores limit).
resp = client.datasets.list_examples( dataset="dataset-name-or-id", space="your-space-name-or-id", # required when using a name limit=100,)for example in resp.examples: print(example)
For details on pagination, field introspection, and data conversion (to dict/JSON/DataFrame), see Response Objects.
Add new examples to an existing dataset. Examples are appended in-place to the latest dataset version by default — this does not create a new version. You can target a specific version by passing dataset_version_id. The response includes the dataset version the examples were written to (dataset_version_id) and the server-generated IDs of the inserted examples (example_ids).
new_examples = [ { "query": "What is machine learning?", "expected_output": "A subset of AI focused on learning from data", "eval.Correctness.label": "correct", }, { "query": "Who invented Python?", "expected_output": "Guido van Rossum", "eval.Correctness.label": "correct", },]result = client.datasets.append_examples( dataset="dataset-name-or-id", space="your-space-name-or-id", # required when using a name examples=new_examples,)print(result.dataset_version_id, result.example_ids)
Note: Do not include system-managed fields (id, created_at, updated_at) in your examples. These are automatically generated by the server.
Write human annotations to a batch of examples in a dataset. Annotations are upserted by annotation config name for each example; submitting the same name for the same example overwrites the previous value. Up to 1000 examples may be annotated per request. This method returns None on success.
from arize.datasets.types import AnnotateRecordInput, AnnotationInputclient.datasets.annotate_examples( dataset="your-dataset-name-or-id", space="your-space-name-or-id", # required when using a name annotations=[ AnnotateRecordInput( record_id="your-example-id", values=[ AnnotationInput(name="quality", score=0.9), AnnotationInput(name="topic", label="science"), ], ), ],)