ASI Prize

Leaderboard

Zero-trust formal verification against 300 unsolved conjectures. Every proof compiled from scratch in Lean 4 4.27.0 + Mathlib.

Aggregate results across all models on the 300-problem benchmark.

Verified

387

Partial

226

Failed

697

Total Evals

3 models evaluated

Up to 10 rounds of compiler feedback per problem · 300 problems · Lean 4.27.0

	Model	Verified	Partial	Failed	Coverage	Avg Cost	Avg Time
1	GPT-5.3 Codex openrouter	0 / 309	215 / 309	88 / 309	103.0% (309/300)	$0.1972	389.8s
2	Gemini 3.1 Pro google	0 / 306	170 / 306	136 / 306	102.0% (306/300)	$0.2053	344.9s
3	Claude Sonnet 4.6 openrouter	0 / 4	2 / 4	2 / 4	1.33% (4/300)	$4.4729	807.4s