The $4,783,750 Zero-Trust Benchmark for Artificial Superintelligence

Evaluating frontier AI models on 300 unsolved math conjectures. Each model gets up to 10 rounds of compiler feedback per problem to produce Lean 4 verifiable proofs. All proofs are verified zero-trust using Lean's official Comparator, preventing adversarial redefinition attacks.

Verification Results

Austin Hatfield, creator of the ASI Prize benchmark

Request Model Eval

Hello! My name is Austin Hatfield. I'm building deep tech AI infrastructure. If you represent an AI lab and want your model formally evaluated on the full 300-problem benchmark, please get in touch.

Leaderboard

View full leaderboard →

#	Model	Verified	Partial	Failed
1	GPT-5.3 Codex	0 / 309	215 / 309	88 / 309
2	Gemini 3.1 Pro	0 / 306	170 / 306	136 / 306
3	Claude Sonnet 4.6	0 / 4	2 / 4	2 / 4

First AI-verified formal proof from the unsolved conjectures benchmark, Green's Problem 24

Milestone

First Verified Proof

Green's Problem 24 (upper_trivial) trivial upper bound, verified in 178 seconds.

View the proof →

Submit Lean 4 proofs for zero-trust formal verification on ASI Prize benchmark

Test Your Model

Submit Proofs

Run your model against 300 unsolved conjectures. Machine-verified in seconds.

Submit now →

Performance by Iteration

Cumulative % of problems achieving partial or verified status by each round

Live

Tactic-Level Replay

Every proof attempt is replayed tactic-by-tactic through the Lean 4 Language Server. The LSP captures the exact goal state after every tactic, showing where goals split, where the model got stuck, and where it made progress. This telemetry feeds back into each iteration.

LSP
Goal state at failure

Per-tactic
Full replay saved

Zero-trust
Compiler is judge