ASI Prize Documentation
Your complete guide to the ASI Lean4 Benchmark -- submitting formally verified proofs via GitHub.
Introduction
ASI Prize is a benchmark for measuring AI progress on formal mathematics. We use zero-trust Lean 4 verification to evaluate proofs of thousands of unsolved mathematical conjectures sourced from Google DeepMind's formal-conjectures project.
Evaluation is binary pass/fail per problem: either the proof type-checks against the Lean 4 kernel with Mathlib, or it does not. There is no partial credit. Submissions are made via the asi-benchmark GitHub repository.
Getting Started
Install Lean 4
Install the Lean 4 toolchain via elan
curl https://raw.githubusercontent.com/leanprover/elan/master/elan-init.sh -sSf | sh
lean --version
Fork the Benchmark
Fork asi-benchmark on GitHub and clone it locally
git clone https://github.com/YOUR_USERNAME/asi-benchmark.git
cd asi-benchmark
Browse Problems
Explore the problem set in data/problems.json or browse individual conjectures in data/conjectures/
Problem Format
Each problem is a JSON file containing a Lean 4 theorem statement with sorry as a placeholder for the proof. Your task is to replace sorry with a valid proof.
{
"id": "ErdosProblems__1__erdos_1",
"title": "erdos_1",
"source": "formal-conjectures",
"category": "research_open",
"description": "If A is a subset of {1,...,N} with |A|=n such that all subset sums are distinct, then N >> 2^n.",
"lean_statement": "theorem erdos_1 : ... := by\n sorry",
"lean_imports": ["FormalConjectures.Util.ProblemImports"],
"difficulty": "Expert",
"tags": ["erdos", "combinatorics"]
}
Submitting Solutions
Solution Requirements
Your proof must:
- Compile without errors against Lean 4 + Mathlib v4.14.0
- Not use any banned tokens
- Preserve the original theorem statement exactly
- Only replace the proof body (the
:= by sorrypart)
Banned Tokens: sorry, admit, admit?, unsafe, #eval, IO, IO.FS, System, open System, Lean.Elab, Lean.Meta, Lean.Compiler, Lake, csimp, native_decide.
Example Solution
import Mathlib.Data.Real.Basic
import Mathlib.Algebra.Ring.Basic
theorem simple_addition : 2 + 2 = 4 := by
norm_num
Verification Process
All submissions undergo automated verification using the Lean 4 kernel. The pipeline:
- Insert proof into the theorem template with the original statement
- Compile with Lean 4 kernel + Mathlib v4.14.0
- Check for banned tokens (
sorry, custom axioms, etc.) - Result: PASS or FAIL (binary, no partial credit)
Accepted Axioms
Only the standard Lean 4 axioms are permitted:
propext(propositional extensionality)Quot.sound(quotient soundness)Classical.choice(classical choice)
Plus everything in Mathlib's standard library. No new axioms may be introduced.
Lean 4 Syntax Guide
Basic Theorem Structure
theorem theorem_name (params : Type) (hypotheses : Prop) :
conclusion := by
proof_tactics
Common Tactics
intro: Introduce assumptionsapply: Apply a theorem or hypothesisexact: Provide exact proof termrw: Rewrite using equalitysimp: Simplify expressionsnorm_num: Normalize numerical expressionsomega: Linear arithmeticring: Ring equalities
Importing Libraries
import Mathlib.Data.Nat.Basic
import Mathlib.Algebra.Ring.Defs
import Mathlib.Tactic
Troubleshooting
Common Issues
Lean/Lake Not Found
Ensure Lean 4 is installed via elan and that ~/.elan/bin is in your PATH.
Import Errors
Make sure you're using Mathlib v4.14.0. Run lake exe cache get in your project to download precompiled Mathlib.
Timeout Errors
If your proof is too slow, try breaking it into smaller lemmas, using simp only instead of simp, or avoiding deep decide calls.
Type Mismatch
Use #check to inspect types. Lean 4 is strict about type coercions -- use explicit casts when needed.
Frequently Asked Questions
General
What is ASI Prize?
ASI Prize is a benchmark for artificial superintelligence progress. We use zero-trust Lean 4 verification to measure the capability of AI models to solve unsolved mathematical problems. All problems are sourced from Google DeepMind's formal-conjectures project.
How do I submit a solution?
Fork the asi-benchmark GitHub repo, add your proof replacing the sorry placeholder, and submit a pull request. The verification pipeline will automatically check your proof.
What is Lean 4?
Lean 4 is a functional programming language and interactive theorem prover developed by Microsoft Research. It's used for formal verification of mathematical proofs. The Lean kernel guarantees that accepted proofs are logically valid.
Verification & Scoring
How does scoring work?
Evaluation is binary pass/fail. Either your proof compiles against the Lean 4 kernel with Mathlib, or it does not. There is no partial credit. The leaderboard tracks total problems solved per model.
What are banned tokens?
Certain Lean commands are prohibited for security and fairness: sorry, admit, unsafe, #eval, IO operations, System access, and metaprogramming hooks. Solutions using these are automatically rejected.
Problems & Prizes
Where do problems come from?
All problems are sourced from Google DeepMind's formal-conjectures repository, which formalizes open mathematical conjectures using Lean 4 and Mathlib.
How do prizes work?
Some problems carry monetary prizes, from $10 (Erdos problems) up to $1,000,000 (Millennium Prize Problems). Prizes are awarded to the first verified solution.
Technical Details
What Mathlib version is used?
The benchmark uses Mathlib v4.14.0. Make sure your local environment matches this version to avoid compatibility issues.
Can I use custom Lean libraries?
Currently, only Mathlib imports are allowed. All proofs must compile using only the standard Lean 4 axioms and Mathlib's library.