Ant GroupMar 2, 2026arXiv:2603.01692

Reasoning as Gradient: Scaling MLE Agents Beyond Tree Search

Yifei Zhang, Xu Yang, Bowen Xian, Qizheng Li, Shikai Fang, Jingyuan Li, Jian Wang, Mingrui Xu, Weiqing Liu, Jiang Bian

AI Summary

The paper introduces Gome, an MLE agent that uses gradient-based optimization by mapping diagnostic reasoning to gradient computation, success memory to momentum, and multi-trace execution to distributed optimization. Gome outperforms tree search on MLE-Bench, especially with stronger LLMs, demonstrating that gradient-based optimization becomes more efficient as reasoning capabilities improve. The study reveals a crossover point where gradient-based optimization surpasses tree search as LLM reasoning strengthens, suggesting its increasing relevance with advancements in reasoning-oriented LLMs.

Key Contribution

As LLMs get smarter, ditching tree search for gradient-based optimization in MLE agents unlocks significant performance gains, especially with frontier-tier models.

Abstract

LLM-based agents for machine learning engineering (MLE) predominantly rely on tree search, a form of gradient-free optimization that uses scalar validation scores to rank candidates. As LLM reasoning capabilities improve, exhaustive enumeration becomes increasingly inefficient compared to directed updates, analogous to how accurate gradients enable efficient descent over random search. We introduce \textsc{Gome}, an MLE agent that operationalizes gradient-based optimization. \textsc{Gome} maps structured diagnostic reasoning to gradient computation, success memory to momentum, and multi-trace execution to distributed optimization. Under a closed-world protocol that isolates architectural effects from external knowledge, \textsc{Gome} achieves a state-of-the-art 35.1\% any-medal rate on MLE-Bench with a restricted 12-hour budget on a single V100 GPU. Scaling experiments across 10 models reveal a critical crossover: with weaker models, tree search retains advantages by compensating for unreliable reasoning through exhaustive exploration; as reasoning capability strengthens, gradient-based optimization progressively outperforms, with the gap widening at frontier-tier models. Given the rapid advancement of reasoning-oriented LLMs, this positions gradient-based optimization as an increasingly favorable paradigm. We release our codebase and GPT-5 traces.

Code Generation & Program Synthesis Reasoning & Chain-of-Thought Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...