NVIDIAEdinburghSep 30, 2025arXiv:2509.26626

Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models

S. Venkatraman, Vineet Jain, Sarthak Mittal, Vedant Shah, Johan Obando-Ceron, Y. Bengio, Brian R. Bartoldson, B. Kailkhura, Guillaume Lajoie, Glen Berseth, Nikolay Malkin, Moksh Jain

AI Summary

The paper introduces Recursive Self-Aggregation (RSA), a novel test-time scaling method for LLMs that iteratively refines a population of reasoning chains by aggregating subsets of solutions. RSA leverages information from intermediate reasoning steps to bootstrap from partially correct chains of thought, combining parallel and sequential scaling benefits. Empirical results demonstrate that RSA significantly improves performance across various tasks and models, enabling smaller models like Qwen3-4B to compete with larger reasoning models.

Key Contribution

By recursively aggregating reasoning chains, even smaller LLMs can now achieve performance competitive with much larger models, challenging the assumption that scale is the only path to improved reasoning.

Abstract

Test-time scaling methods improve the capabilities of large language models (LLMs) by increasing the amount of compute used during inference to make a prediction. Inference-time compute can be scaled in parallel by choosing among multiple independent solutions or sequentially through self-refinement. We propose Recursive Self-Aggregation (RSA), a test-time scaling method inspired by evolutionary methods that combines the benefits of both parallel and sequential scaling. Each step of RSA refines a population of candidate reasoning chains through aggregation of subsets to yield a population of improved solutions, which are then used as the candidate pool for the next iteration. RSA exploits the rich information embedded in the reasoning chains -- not just the final answers -- and enables bootstrapping from partially correct intermediate steps within different chains of thought. Empirically, RSA delivers substantial performance gains with increasing compute budgets across diverse tasks, model families and sizes. Notably, RSA enables Qwen3-4B-Instruct-2507 to achieve competitive performance with larger reasoning models, including DeepSeek-R1 and o3-mini (high), while outperforming purely parallel and sequential scaling strategies across AIME-25, HMMT-25, Reasoning Gym, LiveCodeBench-v6, and SuperGPQA. We further demonstrate that training the model to combine solutions via a novel aggregation-aware reinforcement learning approach yields significant performance gains. Code available at https://github.com/HyperPotatoNeo/RSA.

Inference & Quantization Reasoning & Chain-of-Thought Scaling Laws & Emergent Abilities

Citation Metrics

Citations5

Influential citations1

References43

Year2025

VenuearXiv.org

Related Papers

Finding related papers...

Search

Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models

Related Papers