Capture And Recovery Eval

CARE benchmark

Capture And Recovery Eval. Can a fresh agent recover a plan's full intent — the why, not just the what — from the plan alone? Click for the full story
Top By Intent
Loading--
Candidates
----
Families
----
Runs
----

Candidate summaries

Intent Map

Current leaders

Scoreboard

Total intent recovery

Candidates

0 selected

Reference scores

Public Benchmark Library

Known source data

Model / Benchmark Matrix

Contextual resource data