Feb 3, 2026arXiv:2602.03279

Agentic Proposing: Enhancing Large Language Model Reasoning via Compositional Skill Synthesis

AI Summary

The paper introduces Agentic Proposing, a framework for synthesizing high-quality reasoning datasets by modeling problem generation as a sequential decision process where an agent composes modular reasoning skills. They train an Agentic-Proposer-4B using Multi-Granularity Policy Optimization (MGPO) to generate verifiable training trajectories across math, coding, and science domains. Experiments show that solvers trained on agent-synthesized data outperform baselines and generalize across domains, with a 30B solver achieving 91.6% on AIME25 using only 11,000 synthesized trajectories.

Key Contribution

Forget massive human-curated datasets: a small, agent-synthesized dataset can train a 30B model to rival GPT-5 on AIME25.

Abstract

Advancing complex reasoning in large language models relies on high-quality, verifiable datasets, yet human annotation remains cost-prohibitive and difficult to scale. Current synthesis paradigms often face a recurring trade-off: maintaining structural validity typically restricts problem complexity, while relaxing constraints to increase difficulty frequently leads to inconsistent or unsolvable instances. To address this, we propose Agentic Proposing, a framework that models problem synthesis as a goal-driven sequential decision process where a specialized agent dynamically selects and composes modular reasoning skills. Through an iterative workflow of internal reflection and tool-use, we develop the Agentic-Proposer-4B using Multi-Granularity Policy Optimization (MGPO) to generate high-precision, verifiable training trajectories across mathematics, coding, and science. Empirical results demonstrate that downstream solvers trained on agent-synthesized data significantly outperform leading baselines and exhibit robust cross-domain generalization. Notably, a 30B solver trained on only 11,000 synthesized trajectories achieves a state-of-the-art 91.6% accuracy on AIME25, rivaling frontier-scale proprietary models such as GPT-5 and proving that a small volume of high-quality synthetic signals can effectively substitute for massive human-curated datasets.

Citation Metrics

Citations0

Influential citations0

References27

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Agentic Proposing: Enhancing Large Language Model Reasoning via Compositional Skill Synthesis

Related Papers