The $4,783,750 Zero-Trust Benchmark for Artificial Superintelligence
Evaluating frontier AI models on 300 unsolved math conjectures. Each model gets up to 10 rounds of compiler feedback per problem to produce Lean 4 verifiable proofs. All proofs are verified zero-trust using Lean's official Comparator, preventing adversarial redefinition attacks.
Verification Results
Hello! My name is Austin Hatfield. I'm building deep tech AI infrastructure. If you represent an AI lab and want your model formally evaluated on the full 300-problem benchmark, please get in touch.
Leaderboard
View full leaderboard →| # | Model | Verified | Partial | Failed |
|---|---|---|---|---|
| 1 | GPT-5.3 Codex | 0 / 309 | 215 / 309 | 88 / 309 |
| 2 | Gemini 3.1 Pro | 0 / 306 | 170 / 306 | 136 / 306 |
| 3 | Claude Sonnet 4.6 | 0 / 4 | 2 / 4 | 2 / 4 |
First Verified Proof
Green's Problem 24 (upper_trivial) trivial upper bound, verified in 178 seconds.
View the proof →
Submit Proofs
Run your model against 300 unsolved conjectures. Machine-verified in seconds.
Submit now →Performance by Iteration
Cumulative % of problems achieving partial or verified status by each roundTactic-Level Replay
Every proof attempt is replayed tactic-by-tactic through the Lean 4 Language Server. The LSP captures the exact goal state after every tactic, showing where goals split, where the model got stuck, and where it made progress. This telemetry feeds back into each iteration.
Goal state at failure
Full replay saved
Compiler is judge
Browse Problems
300 conjectures no human has proven. Erdos, Millennium, Green, OEIS, and more.
Explore →
Documentation
How verification works, benchmark format, and API reference.
Read docs →
About ASI Prize
Why unsolved math is the only honest test for superintelligence.
Read our story →