Tsinghua AIAnt GroupCUHKFudanMiniMaxPKUJun 11, 2026arXiv:2606.13473

MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling

Jiacheng Chen, Xinyu Zhang, Shunkai Zhang, Yanmohan Wang, Lin Li, Tiancheng Qin, Qin Wang, Zhengmao Zhu, Tianle Li, Jingyang Li, Zehan Li, Binyan Jiang, Binyang Jiang, Jin Zhu, Jin-Feng Zhu, Han Ding, Fei Yu, F. Yu, Chenyu Du, Zijian Song, Jiayuan Song, Zhi Zhang, Yunan Huang, Weiyu Cheng, Pengyu Zhao, Yuntao Cheng, Yu Cheng

AI Summary

MaxProof introduces a novel framework for enhancing mathematical proof generation by integrating proof generation, verification, and critique-conditioned repair into a single model. This approach leverages a generative verifier with a low false-positive rate to ensure high-quality outputs, and employs tournament selection over a population of candidate proofs during test time. The result is a significant performance boost, with the M3 model achieving scores that surpass the human gold-medal threshold in prestigious mathematical competitions.

Key Contribution

MaxProof's innovative test-time scaling enables an AI to outperform human champions in mathematical proof competitions.

Abstract

We present MaxProof, a population-level test-time scaling framework for competition-level mathematical proof in the MiniMax-M3 series. M3 first trains three proof-oriented capabilities -- proof generation, proof verification, and critique-conditioned proof repair -- using a defense-in-depth generative verifier engineered for low false-positive rate. These capabilities are merged into a single released M3 model. At test time, MaxProof treats the model as a generator, verifier, refiner, and ranker, searches over a population of candidate proofs, and returns one final proof through tournament selection. With MaxProof test-time scaling, the M3 model reaches 35/42 on IMO 2025 and 36/42 on USAMO 2026, exceeding the human gold-medal threshold on both.

Reasoning & Chain-of-Thought Scaling Laws & Emergent Abilities

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling

Related Papers