Google ResearchFeb 24, 2026arXiv:2602.21201

Aletheia tackles FirstProof autonomously

Tony Feng, Tony Feng, Junehyuk Jung, Sang-hyun Kim, Carlo Pagano, Carlo Pagano, Sergei Gukov, Sergei Gukov, Chiang-Chiang Tsai, David Woodruff, David P. Woodruff, Adel Javanmard, Adel Javanmard, Aryan Mokhtari, Aryan Mokhtari, Dawsen Hwang, Dawsen Hwang, Yuri Chervonyi, Yuri Chervonyi, Jonathan N. Lee, Garrett Bingham, G. Bingham, Trieu H. Trinh, Trieu H. Trinh, V. Mirrokni, Vahab Mirrokni, Quoc V. Le, Quoc V. Le, Thang Luong, Thang Luong

AI Summary

Aletheia, a mathematics research agent based on Gemini 3 Deep Think, was evaluated on the FirstProof challenge, demonstrating its ability to autonomously solve mathematical problems. The agent successfully solved 6 out of 10 problems, as determined by a majority of expert assessments, showcasing its problem-solving capabilities in a challenging mathematical domain. This achievement highlights the potential of AI agents to contribute to mathematical research and automated theorem proving.

Key Contribution

Gemini 3 Deep Think can now autonomously solve a majority of problems in a challenging math competition, signaling a leap in AI's mathematical reasoning capabilities.

Abstract

We report the performance of Aletheia (Feng et al., 2026b), a mathematics research agent powered by Gemini 3 Deep Think, on the inaugural FirstProof challenge. Within the allowed timeframe of the challenge, Aletheia autonomously solved 6 problems (2, 5, 7, 8, 9, 10) out of 10 according to majority expert assessments; we note that experts were not unanimous on Problem 8 (only). For full transparency, we explain our interpretation of FirstProof and disclose details about our experiments as well as our evaluation. Raw prompts and outputs are available at https://github.com/google-deepmind/superhuman/tree/main/aletheia.

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References9

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Aletheia tackles FirstProof autonomously

Related Papers