ASI Prize Documentation

Your complete guide to the ASI Lean4 Benchmark -- submitting formally verified proofs via GitHub.

Introduction

ASI Prize is a benchmark for measuring AI progress on formal mathematics. We use zero-trust Lean 4 verification to evaluate proofs of thousands of unsolved mathematical conjectures sourced from Google DeepMind's formal-conjectures project.

Evaluation is binary pass/fail per problem: either the proof type-checks against the Lean 4 kernel with Mathlib, or it does not. There is no partial credit. Submissions are made via the asi-benchmark GitHub repository.

Getting Started

1

Install Lean 4

Install the Lean 4 toolchain via elan

curl https://raw.githubusercontent.com/leanprover/elan/master/elan-init.sh -sSf | sh
lean --version
2

Fork the Benchmark

Fork asi-benchmark on GitHub and clone it locally

git clone https://github.com/YOUR_USERNAME/asi-benchmark.git
cd asi-benchmark
3

Browse Problems

Explore the problem set in data/problems.json or browse individual conjectures in data/conjectures/

Problem Format

Each problem is a JSON file containing a Lean 4 theorem statement with sorry as a placeholder for the proof. Your task is to replace sorry with a valid proof.

{
  "id": "ErdosProblems__1__erdos_1",
  "title": "erdos_1",
  "source": "formal-conjectures",
  "category": "research_open",
  "description": "If A is a subset of {1,...,N} with |A|=n such that all subset sums are distinct, then N >> 2^n.",
  "lean_statement": "theorem erdos_1 : ... := by\n  sorry",
  "lean_imports": ["FormalConjectures.Util.ProblemImports"],
  "difficulty": "Expert",
  "tags": ["erdos", "combinatorics"]
}

Submitting Solutions

Solution Requirements

Your proof must:

  1. Compile without errors against Lean 4 + Mathlib v4.14.0
  2. Not use any banned tokens
  3. Preserve the original theorem statement exactly
  4. Only replace the proof body (the := by sorry part)

Banned Tokens: sorry, admit, admit?, unsafe, #eval, IO, IO.FS, System, open System, Lean.Elab, Lean.Meta, Lean.Compiler, Lake, csimp, native_decide.

Example Solution

import Mathlib.Data.Real.Basic
import Mathlib.Algebra.Ring.Basic

theorem simple_addition : 2 + 2 = 4 := by
  norm_num

Verification Process

All submissions undergo automated verification using the Lean 4 kernel. The pipeline:

  1. Insert proof into the theorem template with the original statement
  2. Compile with Lean 4 kernel + Mathlib v4.14.0
  3. Check for banned tokens (sorry, custom axioms, etc.)
  4. Result: PASS or FAIL (binary, no partial credit)

Accepted Axioms

Only the standard Lean 4 axioms are permitted:

  • propext (propositional extensionality)
  • Quot.sound (quotient soundness)
  • Classical.choice (classical choice)

Plus everything in Mathlib's standard library. No new axioms may be introduced.

Lean 4 Syntax Guide

Basic Theorem Structure

theorem theorem_name (params : Type) (hypotheses : Prop) :
  conclusion := by
  proof_tactics

Common Tactics

  • intro: Introduce assumptions
  • apply: Apply a theorem or hypothesis
  • exact: Provide exact proof term
  • rw: Rewrite using equality
  • simp: Simplify expressions
  • norm_num: Normalize numerical expressions
  • omega: Linear arithmetic
  • ring: Ring equalities

Importing Libraries

import Mathlib.Data.Nat.Basic
import Mathlib.Algebra.Ring.Defs
import Mathlib.Tactic

Troubleshooting

Common Issues

Lean/Lake Not Found

Ensure Lean 4 is installed via elan and that ~/.elan/bin is in your PATH.

Import Errors

Make sure you're using Mathlib v4.14.0. Run lake exe cache get in your project to download precompiled Mathlib.

Timeout Errors

If your proof is too slow, try breaking it into smaller lemmas, using simp only instead of simp, or avoiding deep decide calls.

Type Mismatch

Use #check to inspect types. Lean 4 is strict about type coercions -- use explicit casts when needed.

Frequently Asked Questions

General

What is ASI Prize?

ASI Prize is a benchmark for artificial superintelligence progress. We use zero-trust Lean 4 verification to measure the capability of AI models to solve unsolved mathematical problems. All problems are sourced from Google DeepMind's formal-conjectures project.

How do I submit a solution?

Fork the asi-benchmark GitHub repo, add your proof replacing the sorry placeholder, and submit a pull request. The verification pipeline will automatically check your proof.

What is Lean 4?

Lean 4 is a functional programming language and interactive theorem prover developed by Microsoft Research. It's used for formal verification of mathematical proofs. The Lean kernel guarantees that accepted proofs are logically valid.

Verification & Scoring

How does scoring work?

Evaluation is binary pass/fail. Either your proof compiles against the Lean 4 kernel with Mathlib, or it does not. There is no partial credit. The leaderboard tracks total problems solved per model.

What are banned tokens?

Certain Lean commands are prohibited for security and fairness: sorry, admit, unsafe, #eval, IO operations, System access, and metaprogramming hooks. Solutions using these are automatically rejected.

Problems & Prizes

Where do problems come from?

All problems are sourced from Google DeepMind's formal-conjectures repository, which formalizes open mathematical conjectures using Lean 4 and Mathlib.

How do prizes work?

Some problems carry monetary prizes, from $10 (Erdos problems) up to $1,000,000 (Millennium Prize Problems). Prizes are awarded to the first verified solution.

Technical Details

What Mathlib version is used?

The benchmark uses Mathlib v4.14.0. Make sure your local environment matches this version to avoid compatibility issues.

Can I use custom Lean libraries?

Currently, only Mathlib imports are allowed. All proofs must compile using only the standard Lean 4 axioms and Mathlib's library.

Copied to clipboard!