Search papers, labs, and topics across Lattice.
This paper decomposes long-context reasoning in LLMs into atomic skills and synthesizes pseudo-datasets targeting each skill. Reinforcement learning is then used to train the model on these datasets, improving its proficiency in the atomic skills. Experiments show this approach improves performance on long-context reasoning benchmarks by an average of 7.7% compared to a strong baseline.
Forget end-to-end training: breaking down long-context reasoning into atomic skills and training on targeted pseudo-data unlocks a 7.7% performance boost.
Long-context reasoning is essential for complex real-world applications, yet remains a significant challenge for Large Language Models (LLMs). Despite the rapid evolution in long-context reasoning, current research often overlooks the internal complexity of the long-context reasoning task itself. In this paper, we move beyond this holistic view and decompose long-context reasoning into a set of fundamental atomic skills, and we then automatically synthesize a suite of pseudo datasets, each explicitly targeting a specific atomic skill. Our empirical analysis confirms that proficiency in these atomic skills is strongly correlated with general long-text reasoning performance. Building on this insight, we employ reinforcement learning on these pseudo datasets to sharpen the model's atomic skills, in the hope of boosting its general long-context reasoning ability. Extensive experiments across multiple benchmarks demonstrate the effectiveness of our approach: it outperforms a strong baseline by an average margin of 7.7\% (improving from 46.3\% to 54.0\%) across Loogle, Loong, LongBench-v2, BrowscompLong, Ruler-qa2, and MRCR.