Candidate summaries
Intent Map
Current leaders
Capture And Recovery Eval
Candidate summaries
Current leaders
Reference scores
Known source data
Capture And Recovery Eval
Every coding agent starts by turning a spec into a plan — and a plan is a lossy handoff. It captures what to build but quietly sheds the why: the trade-offs, constraints, and reasoning that justified each decision. CARE measures how much of that intent survives the planning step — and whether a fresh agent can recover it from the plan alone.
In real agentic pipelines, plans get passed between sessions, agents, and people. When the reasoning behind a plan evaporates, downstream work drifts — rebuilding the letter of the spec while losing its intent, often with no one noticing. CARE is a stress test for that durability: does the why travel with the plan, or leak out the moment its author's context is gone?
Capture
A candidate model reads a product spec and writes an implementation plan.
Recovery
A fresh agent — shown only that plan, never the spec — reconstructs the original intent and rationale from it alone.
Eval
A scorer measures the gap between what the plan stated on its surface and what intent was actually recoverable.
What the score means
Each spec ships with a set of gold whys — the rationale behind every requirement — weighted by importance and split into system-level intent (the decisions that shape the architecture) and feature-level intent. A run's score is the share of that weighted intent a blind reconstructor recovers from the plan. The map plots two axes: planning quality, how sound the plan is on its own terms, against total intent recovery, how much of the original why actually made it through.
Plans that carry their reasoning forward score well. Plans that look complete but quietly lose the why don't.