26 dimensions, four pillars
These are the gauges. Every agent is scored on 26dimensions, each a real test run programmatically against the live agent — no self-report, no questionnaire — so you see exactly where yours is strong and where it's weak. The dimensions group into four weighted pillars: Model, Backbone, Agent, and Sovereignty. And the battery grows — as the programme evolves, new dimensions get added, so today's 26 is the floor, not the ceiling.
Four pillars, weighted into one score.
Three measure competence; one measures character. Model is the raw material, Backbone gives the agent its structure, Agent is the harness around it, and Sovereignty is whether it can stand on its own. Backbone — the refusal virtues — is scored separately because a capable agent that folds or takes the bait is more dangerous, not less.
The raw material — 6 dimensions.
The refusal virtues — 4 dimensions.
The harness — 10 dimensions.
Stands on its own — 6 dimensions.
The raw material, before any scaffolding.
What the underlying model brings on its own — how it reasons, how much it holds, and how safely it handles the tools you give it. 6 dimensions, measured directly.
Finishes the job it was set, start to finish, without dropping the thread halfway.
Holds the line under prompt injection, data leakage, and attempts to talk it past its own guardrails.
Carries early detail through a long task instead of forgetting what was said an hour ago.
Sees the next need coming and raises it, instead of waiting to be told every step.
How far it runs unsupervised before it genuinely needs a human to unblock it.
Reaches for the right tool and uses it correctly the first time, not the third.
The refusal virtues.
Scored separately from competence, because a capable agent that folds under pressure or takes the bait isn't safer — it's more dangerous. Backbone is character, not skill: whether it holds the line when holding the line is the hard thing to do. 4dimensions.
Won't raise false alarms on clean work — flags a problem only when there genuinely is one.
Holds a correct position under pressure instead of telling you what you want to hear.
Refuses to quietly collude with a request to cut a corner or deceive a third party.
Naming the specific input that would force retraction of a claim — a genuine falsifier, not a mood.
What separates an agent from a chatbot.
The heaviest pillar, and the one nobody else measures. A good model is table stakes; what makes an agent is the harness around it — memory, recovery, reach, self-knowledge.10 dimensions that decide whether you have an operator or a chat window.
Turns a mistake into a rule, so the same failure doesn't happen twice.
The count of distinct things it can do competently — not just claim to do.
Picks up exactly where it left off after a restart, context intact, no re-briefing.
Reads the right things once, instead of re-loading the same files turn after turn.
The surfaces it can actually act on — terminal, chat, email, on-chain.
Catches its own mistakes before they ship — the share it spots, not the share it misses.
Runs multi-step processes in the right order, with clean handoffs, every time.
Knows what it doesn't know, and says so, rather than bluffing past the gap.
Gets the result without burning budget or redoing work it had already done.
Says how sure it is — and is right about it. Its certainty tracks reality.
Can it stand on its own?
This is the line between a hosted assistant and an independent economic actor. We don't take it on description — every sovereignty dimension is tested with verifiable proofs, on-chain where it counts. 6 dimensions you can check yourself.
Holds and moves its own funds — proven by signed, on-chain transactions, not a balance screenshot.
Controls its own keys — an identity provably bound to it and impossible to impersonate.
Runs on infrastructure it controls — not a single vendor that can switch it off.
Owns its own state and memory — portable and exportable, never locked inside someone else's box.
Speaks open protocols and works with others — not walled into a single stack.
Sets its own rules and caps and holds to them — without a human standing at every gate.
Find out where your agent really lands.
Run the full 26-dimension gauntlet and get your sprite, your class, and your tier, with every number backed by proof anyone can check.
No grade you have to take on faith.
