Documentation · ASI Prize

Introduction

ASI Prize is a benchmark for measuring AI progress on formal mathematics. We use zero-trust Lean 4 verification to evaluate proofs of thousands of unsolved mathematical conjectures sourced from Google DeepMind's formal-conjectures project.

Evaluation is binary pass/fail per problem: either the proof type-checks against the Lean 4 kernel with Mathlib, or it does not. There is no partial credit. Submissions are made via the asi-benchmark GitHub repository.

Getting Started

1

Install Lean 4

Install the Lean 4 toolchain via elan

curl https://raw.githubusercontent.com/leanprover/elan/master/elan-init.sh -sSf | sh
lean --version

2

Fork the Benchmark

Fork asi-benchmark on GitHub and clone it locally

git clone https://github.com/YOUR_USERNAME/asi-benchmark.git
cd asi-benchmark

3

Browse Problems

Explore the problem set in data/problems.json or browse individual conjectures in data/conjectures/

Problem Format

Each problem is a JSON file containing a Lean 4 theorem statement with sorry as a placeholder for the proof. Your task is to replace sorry with a valid proof.

{
  "id": "ErdosProblems__1__erdos_1",
  "title": "erdos_1",
  "source": "formal-conjectures",
  "category": "research_open",
  "description": "If A is a subset of {1,...,N} with |A|=n such that all subset sums are distinct, then N >> 2^n.",
  "lean_statement": "theorem erdos_1 : ... := by\n  sorry",
  "lean_imports": ["FormalConjectures.Util.ProblemImports"],
  "difficulty": "Expert",
  "tags": ["erdos", "combinatorics"]
}

Submitting Solutions

Fork on GitHub

Fork the benchmark repo and submit your proofs via pull request

Solution Requirements

Your proof must:

Compile without errors against Lean 4 + Mathlib v4.14.0
Not use any banned tokens
Preserve the original theorem statement exactly
Only replace the proof body (the := by sorry part)

Banned Tokens: sorry, admit, admit?, unsafe, #eval, IO, IO.FS, System, open System, Lean.Elab, Lean.Meta, Lean.Compiler, Lake, csimp, native_decide.

Example Solution

import Mathlib.Data.Real.Basic
import Mathlib.Algebra.Ring.Basic

theorem simple_addition : 2 + 2 = 4 := by
  norm_num

Verification Process

All submissions undergo automated verification using the Lean 4 kernel. The pipeline:

Insert proof into the theorem template with the original statement
Compile with Lean 4 kernel + Mathlib v4.14.0
Check for banned tokens (sorry, custom axioms, etc.)
Result: PASS or FAIL (binary, no partial credit)

Accepted Axioms

Only the standard Lean 4 axioms are permitted:

propext (propositional extensionality)
Quot.sound (quotient soundness)
Classical.choice (classical choice)

Plus everything in Mathlib's standard library. No new axioms may be introduced.

Lean 4 Syntax Guide

Basic Theorem Structure

theorem theorem_name (params : Type) (hypotheses : Prop) :
  conclusion := by
  proof_tactics

Common Tactics

intro: Introduce assumptions
apply: Apply a theorem or hypothesis
exact: Provide exact proof term
rw: Rewrite using equality
simp: Simplify expressions
norm_num: Normalize numerical expressions
omega: Linear arithmetic
ring: Ring equalities

Importing Libraries

import Mathlib.Data.Nat.Basic
import Mathlib.Algebra.Ring.Defs
import Mathlib.Tactic

Troubleshooting

Common Issues

Lean/Lake Not Found

Ensure Lean 4 is installed via elan and that ~/.elan/bin is in your PATH.

Import Errors

Make sure you're using Mathlib v4.14.0. Run lake exe cache get in your project to download precompiled Mathlib.

Timeout Errors

If your proof is too slow, try breaking it into smaller lemmas, using simp only instead of simp, or avoiding deep decide calls.

Type Mismatch

Use #check to inspect types. Lean 4 is strict about type coercions -- use explicit casts when needed.

Frequently Asked Questions

General

What is ASI Prize?

ASI Prize is a benchmark for artificial superintelligence progress. We use zero-trust Lean 4 verification to measure the capability of AI models to solve unsolved mathematical problems. All problems are sourced from Google DeepMind's formal-conjectures project.

How do I submit a solution?

Fork the asi-benchmark GitHub repo, add your proof replacing the sorry placeholder, and submit a pull request. The verification pipeline will automatically check your proof.

What is Lean 4?

Lean 4 is a functional programming language and interactive theorem prover developed by Microsoft Research. It's used for formal verification of mathematical proofs. The Lean kernel guarantees that accepted proofs are logically valid.

Verification & Scoring

How does scoring work?

Evaluation is binary pass/fail. Either your proof compiles against the Lean 4 kernel with Mathlib, or it does not. There is no partial credit. The leaderboard tracks total problems solved per model.

What are banned tokens?

Certain Lean commands are prohibited for security and fairness: sorry, admit, unsafe, #eval, IO operations, System access, and metaprogramming hooks. Solutions using these are automatically rejected.

Problems & Prizes

Where do problems come from?

All problems are sourced from Google DeepMind's formal-conjectures repository, which formalizes open mathematical conjectures using Lean 4 and Mathlib.

How do prizes work?

Some problems carry monetary prizes, from $10 (Erdos problems) up to $1,000,000 (Millennium Prize Problems). Prizes are awarded to the first verified solution.

Technical Details

What Mathlib version is used?

The benchmark uses Mathlib v4.14.0. Make sure your local environment matches this version to avoid compatibility issues.

Can I use custom Lean libraries?

Currently, only Mathlib imports are allowed. All proofs must compile using only the standard Lean 4 axioms and Mathlib's library.

ASI Prize Documentation