Evaluators - Arize AX Docs

The evaluators client methods are currently in BETA. The API may change without notice. A one-time warning is emitted on first use.

Create and manage LLM-as-judge evaluators and their versions programmatically. Evaluators use prompt templates with {variable} placeholders that reference span or trace attributes to automatically score your LLM application’s outputs.

Key Capabilities

Create template-based LLM-as-judge evaluators within a space
Version evaluators with commit messages (versions are immutable once created)
Retrieve evaluators with their latest or a specific version
List, update, and delete evaluators
List and retrieve individual evaluator versions

List Evaluators

List all evaluators you have access to, with optional filtering by space.

resp = client.evaluators.list(
    space="your-space-name-or-id",  # optional
    name="Relevance",               # optional substring filter
    limit=50,
)

for evaluator in resp.evaluators:
    print(evaluator.id, evaluator.name)

For details on pagination, field introspection, and data conversion (to dict/JSON/DataFrame), see Response Objects.

Create a Template (LLM-as-Judge) Evaluator

Create a new template evaluator with an initial version. Evaluator names must be unique within the target space.

from arize.evaluators.types import TemplateConfig, EvaluatorLlmConfig

evaluator = client.evaluators.create_template_evaluator(
    name="Relevance",
    space="your-space-name-or-id",
    commit_message="Initial version",
    description="Scores whether the response is relevant to the query",
    template_config=TemplateConfig(
        name="Relevance",
        template="Is the following response relevant to the query?\nQuery: {input.value}\nResponse: {output.value}",
        include_explanations=True,
        use_function_calling_if_available=True,
        classification_choices={"relevant": 1, "irrelevant": 0},
        direction="maximize",
        llm_config=EvaluatorLlmConfig(
            ai_integration_id="your-ai-integration-id",
            model_name="gpt-4o",
            invocation_parameters={"temperature": 0},
            provider_parameters={},
        ),
    ),
)

print(evaluator.id, evaluator.name)

Create a Code Evaluator

Create a new code evaluator with an initial version. Use ManagedCodeConfig for built-in checks (JSONParseable, Regex, KeywordMatch, ExactMatch) or CustomCodeConfig for user-supplied Python.

from arize.evaluators.types import ManagedCodeConfig

evaluator = client.evaluators.create_code_evaluator(
    name="JSON Parseable",
    space="your-space-name-or-id",
    commit_message="Initial version",
    code_config=ManagedCodeConfig(
        type="managed",
        name="json_parseable",
        managed_evaluator="JSONParseable",
        variables=["output"],
    ),
)

print(evaluator.id, evaluator.name)

Evaluator name must match the regex ^[a-zA-Z0-9_\s\-&()]+$.

Template Variables

Template strings use {variable} placeholders (f-string format) that reference span or trace attributes (e.g., {input.value}, {output.value}, {attributes.my_custom_attr}).

Classification vs. Freeform Output

Classification — Provide classification_choices as a dict[str, float] mapping label → numeric score (e.g., {"relevant": 1, "irrelevant": 0}). The evaluator outputs one of these labels along with its score.
Freeform — Omit classification_choices. The evaluator produces a numeric score without predefined labels.

Get an Evaluator

Retrieve an evaluator by name or ID. By default the latest version is returned. When using a name, provide space to disambiguate.

evaluator = client.evaluators.get(
    evaluator="your-evaluator-name-or-id",
    space="your-space-name-or-id",  # required when resolving by evaluator name
)

print(evaluator.id, evaluator.name)
print(evaluator.version)

Get a Specific Version

evaluator = client.evaluators.get(
    evaluator="your-evaluator-name-or-id",
    space="your-space-name-or-id",  # required when resolving by evaluator name
    version_id="specific-version-id",
)

Update an Evaluator

Update an evaluator’s metadata (name and/or description). To change the template configuration, create a new version instead.

evaluator = client.evaluators.update(
    evaluator="your-evaluator-name-or-id",
    space="your-space-name-or-id",  # required when resolving by evaluator name
    name="Relevance v2",
    description="Updated description",
)

print(evaluator)

Delete an Evaluator

Delete an evaluator and all its versions. This operation is irreversible. There is no response from this call.

client.evaluators.delete(
    evaluator="your-evaluator-name-or-id",
    space="your-space-name-or-id",  # required when resolving by evaluator name
)

print("Evaluator deleted successfully")

Manage Versions

Evaluator versions are immutable once created. To change the template configuration, create a new version — it becomes the latest version immediately.

List Versions

List all versions for an evaluator.

resp = client.evaluators.list_versions(
    evaluator="your-evaluator-name-or-id",
    space="your-space-name-or-id",  # required when resolving by evaluator name
    limit=50,
)

for version in resp.evaluator_versions:
    print(version.id, version.commit_message)

For details on pagination, field introspection, and data conversion (to dict/JSON/DataFrame), see Response Objects.

Get a Version

Retrieve a specific evaluator version by its ID.

version = client.evaluators.get_version(version_id="your-version-id")

print(version.id, version.commit_message)

Create a New Template Version

Add a new template version to an existing template evaluator. The new version becomes the latest immediately.

from arize.evaluators.types import TemplateConfig, EvaluatorLlmConfig

version = client.evaluators.create_template_version(
    evaluator="your-evaluator-name-or-id",
    space="your-space-name-or-id",  # required when resolving by evaluator name
    commit_message="Improved prompt for edge cases",
    template_config=TemplateConfig(
        name="Relevance",
        template="Rate the relevance of the response on a scale of 0 to 1.\nQuery: {input.value}\nResponse: {output.value}",
        include_explanations=True,
        use_function_calling_if_available=True,
        classification_choices={"relevant": 1, "irrelevant": 0},
        direction="maximize",
        llm_config=EvaluatorLlmConfig(
            ai_integration_id="your-ai-integration-id",
            model_name="gpt-4o",
            invocation_parameters={"temperature": 0},
            provider_parameters={},
        ),
    ),
)

print(version.id)

Create a New Code Version

Add a new code version to an existing code evaluator.

from arize.evaluators.types import ManagedCodeConfig

version = client.evaluators.create_code_version(
    evaluator="your-evaluator-name-or-id",
    space="your-space-name-or-id",  # required when resolving by evaluator name
    commit_message="Updated managed evaluator",
    code_config=ManagedCodeConfig(
        type="managed",
        name="json_parseable",
        managed_evaluator="JSONParseable",
        variables=["output"],
    ),
)

print(version.id)

Learn more: Online Evaluations Documentation

​Key Capabilities

​List Evaluators

​Create a Template (LLM-as-Judge) Evaluator

​Create a Code Evaluator

​Template Variables

​Classification vs. Freeform Output

​Get an Evaluator

​Get a Specific Version

​Update an Evaluator

​Delete an Evaluator

​Manage Versions

​List Versions

​Get a Version

​Create a New Template Version

​Create a New Code Version

Key Capabilities

List Evaluators

Create a Template (LLM-as-Judge) Evaluator

Create a Code Evaluator

Template Variables

Classification vs. Freeform Output

Get an Evaluator

Get a Specific Version

Update an Evaluator

Delete an Evaluator

Manage Versions

List Versions

Get a Version

Create a New Template Version

Create a New Code Version