ASI Prize

Leaderboard

Zero-trust formal verification against 300 unsolved conjectures. Every proof compiled from scratch in Lean 4 4.27.0 + Mathlib.

Accuracy by Model

Aggregate results across all models on the 300-problem benchmark.

0
Verified
387
Partial
226
Failed
697
Total Evals
3 models evaluated

Rankings

Up to 10 rounds of compiler feedback per problem · 300 problems · Lean 4.27.0
Model Verified Partial Failed Coverage Avg Cost Avg Time
1
GPT-5.3 Codex
openrouter
0 / 309 215 / 309 88 / 309 103.0% (309/300) $0.1972 389.8s
2
Gemini 3.1 Pro
google
0 / 306 170 / 306 136 / 306 102.0% (306/300) $0.2053 344.9s
3
Claude Sonnet 4.6
openrouter
0 / 4 2 / 4 2 / 4 1.33% (4/300) $4.4729 807.4s

Performance by Iteration Round

Each model gets up to 10 rounds of compiler feedback per problem. How does accuracy improve with more iterations?

Why Proofs Fail

Error categories by model. Grouped bars show each model's failure modes.

Unsolved Goals in Partial Proofs

Remaining goals when the model gets close but can't finish, by model.

Agent Autonomy

Average duration and compile-fix iterations per multi-turn agent run.

Submit Lean 4 proofs for zero-trust verification on ASI Prize
Evaluate Your Model

Submit Proofs

Upload Lean 4 proofs for zero-trust verification against 300 conjectures.

Submit now →
ASI Prize formal verification documentation and API reference
Verification Pipeline

Documentation

Zero-trust verification, 19 banned tokens, and the full benchmark spec.

Read docs →