Search papers, labs, and topics across Lattice.
This paper investigates the use of LLMs to generate formal proofs in Lean for solving open mathematical problems. They evaluated this approach on a large scale, targeting Erd艖s problems and OEIS conjectures. Their most capable agent autonomously solved 9 of 353 open Erd艖s problems and 44 of 492 OEIS conjectures, demonstrating the potential of AI-aided formal proof search.
AI can now autonomously solve open math problems, cracking 9 Erd艖s problems and 44 OEIS conjectures at a reasonable cost.
Large language models (LLMs) increasingly excel at mathematical reasoning, but their unreliability limits their utility in mathematics research. A mitigation is using LLMs to generate formal proofs in languages like Lean. We perform the first large-scale evaluation of this method's ability to solve open problems. Our most capable agent autonomously resolved 9 of 353 open Erd艖s problems at the per-problem cost of a few hundred dollars, proved 44/492 OEIS conjectures, and is being deployed in combinatorics, optimization, graph theory, algebraic geometry, and quantum optics research. A basic agent alternating LLM-based generation with Lean-based verification replicated the Erd艖s successes but proved costlier on the hardest problems. These findings demonstrate the power of AI-aided formal proof search and shed light on the agent designs that enable it.