Search papers, labs, and topics across Lattice.
This paper introduces Atomic Decomposition and Recombination (ADR), a framework designed to enhance the scalability of Reinforcement Learning with Verifiable Rewards (RLVR) by generating challenging verifiable code tasks through the decomposition of tasks into atomic elements and their controlled recombination. The authors demonstrate that ADR significantly outperforms existing methods in terms of originality, difficulty, diversity, and test quality, leading to improved coding abilities in Large Language Models across various domains such as algorithmic programming and data science. This advancement addresses the critical limitation of task scarcity in RLVR, enabling more effective training and better performance of LLMs in coding tasks.
ADR transforms the landscape of code task generation, enabling LLMs to tackle genuinely novel and challenging coding problems that enhance their performance.
Reinforcement Learning with Verifiable Rewards (RLVR) has recently emerged as the cornerstone for shaping the remarkable coding abilities of Large Language Models (LLMs). However, the scalability of RLVR is severely constrained by the scarcity of sufficiently challenging verifiable code tasks that target near the model's edge of competence. Prior studies often rely on heuristic seed expansions for data synthesis, which severely limits both novelty and difficulty. Consequently, the training value of such data fails to scale proportionally with the size of its synthesis. To this end, we propose Atomic Decomposition and Recombination (ADR), a novel framework that generates verifiable code tasks via decomposition into atomic elements and controlled recombination, thereby enabling the generation of genuinely novel and challenging verifiable code tasks. Experiments and analysis demonstrate that ADR achieves superior originality, difficulty, diversity, and test quality over existing baselines, and consistently delivers greater improvements in code ability across RLVR in diverse downstream domains, including algorithmic programming, tool usage, and data science. Our work sheds light on a new paradigm for novel code task synthesis and scalable RLVR training.