Skip to main content
Evals are first-party model scorecards. Use them to track benchmark results for models your team cares about, such as SWE-bench scores, internal support-quality scores, or latency-adjusted quality metrics. An eval contains metadata about the benchmark and a table of model scores:
FieldExample
NameSWE-bench Verified
Metricresolved
Unitpercent
Modelopenai/gpt-5.5
Score87

Upload Scores

Open Evals in the Dari dashboard and create a new eval. Upload a CSV with these columns:
model_id,score,notes
openai/gpt-5.5,87,Strong public run
anthropic/claude-sonnet-4-6,81,
openai/o3,74,
model_id and score are required. notes is optional. Scores are numeric; if your metric is a percentage, put percent or % in the eval unit.

Dari and Team Evals

Dari can publish curated global scorecards, and your organization can upload private evals. Both appear in the Evals section so you can compare public benchmarks with your own measurements.

Import Evals Into Routers

Routers can import eval scorecards from the router create/edit pages. Imported evals become structured benchmark evidence for the routing selector. At request time, Dari only sends score rows whose model_id matches one of the router’s enabled models. Use imported evals for stable benchmark facts, such as “SWE-bench Verified: openai/gpt-5.5 scored 87.” Use routing.instructions for request-specific tradeoffs like cost, latency, or when to prefer stronger reasoning.