The $4,783,750 Zero-Trust Benchmark for Artificial Superintelligence

Evaluating frontier AI models on 300 unsolved math conjectures. Each model gets up to 10 rounds of compiler feedback per problem to produce Lean 4 verifiable proofs. All proofs are verified zero-trust using Lean's official Comparator, preventing adversarial redefinition attacks.

Verification Results

Austin Hatfield, creator of the ASI Prize benchmark
Request Model Eval

Hello! My name is Austin Hatfield. I'm building deep tech AI infrastructure. If you represent an AI lab and want your model formally evaluated on the full 300-problem benchmark, please get in touch.

Leaderboard

View full leaderboard →
# Model Verified Partial Failed
1 GPT-5.3 Codex 0 / 309 215 / 309 88 / 309
2 Gemini 3.1 Pro 0 / 306 170 / 306 136 / 306
3 Claude Sonnet 4.6 0 / 4 2 / 4 2 / 4
First AI-verified formal proof from the unsolved conjectures benchmark, Green's Problem 24
Milestone

First Verified Proof

Green's Problem 24 (upper_trivial) trivial upper bound, verified in 178 seconds.

View the proof →
Submit Lean 4 proofs for zero-trust formal verification on ASI Prize benchmark
Test Your Model

Submit Proofs

Run your model against 300 unsolved conjectures. Machine-verified in seconds.

Submit now →

Performance by Iteration

Cumulative % of problems achieving partial or verified status by each round
Live

Tactic-Level Replay

Every proof attempt is replayed tactic-by-tactic through the Lean 4 Language Server. The LSP captures the exact goal state after every tactic, showing where goals split, where the model got stuck, and where it made progress. This telemetry feeds back into each iteration.

LSP
Goal state at failure
Per-tactic
Full replay saved
Zero-trust
Compiler is judge
Browse 300 unsolved mathematical conjectures from Erdos, Green, OEIS, and ArXiv formalized in Lean 4

Browse Problems

300 conjectures no human has proven. Erdos, Millennium, Green, OEIS, and more.

Explore →
ASI Prize documentation for formal verification pipeline, benchmark format, and Lean 4 API reference

Documentation

How verification works, benchmark format, and API reference.

Read docs →
About ASI Prize zero-trust formal verification benchmark inspired by ARC-AGI and DeepMind AlphaProof

About ASI Prize

Why unsolved math is the only honest test for superintelligence.

Read our story →