Search papers, labs, and topics across Lattice.
MaxProof introduces a novel framework for enhancing mathematical proof generation by integrating proof generation, verification, and critique-conditioned repair into a single model. This approach leverages a generative verifier with a low false-positive rate to ensure high-quality outputs, and employs tournament selection over a population of candidate proofs during test time. The result is a significant performance boost, with the M3 model achieving scores that surpass the human gold-medal threshold in prestigious mathematical competitions.
MaxProof's innovative test-time scaling enables an AI to outperform human champions in mathematical proof competitions.
We present MaxProof, a population-level test-time scaling framework for competition-level mathematical proof in the MiniMax-M3 series. M3 first trains three proof-oriented capabilities -- proof generation, proof verification, and critique-conditioned proof repair -- using a defense-in-depth generative verifier engineered for low false-positive rate. These capabilities are merged into a single released M3 model. At test time, MaxProof treats the model as a generator, verifier, refiner, and ranker, searches over a population of candidate proofs, and returns one final proof through tournament selection. With MaxProof test-time scaling, the M3 model reaches 35/42 on IMO 2025 and 36/42 on USAMO 2026, exceeding the human gold-medal threshold on both.