Skip to main content
Testing a prompt means running it against your data and scoring outputs with evaluators. You can replay production traces, compare versions side by side, and catch regressions before anything ships.
Once a prompt is saved to Prompt Hub, attach a dataset, run the model, and open View Experiment for a full breakdown of results. For the conceptual model — what the Playground is for, the three modes (run on dataset, replay on spans, side-by-side compare) — see The Prompt Playground.

Workflow

Coming soon!

Next up

Once you have runs and eval signals you trust, Optimize a prompt to turn that feedback into an improved template.