Search papers, labs, and topics across Lattice.
This paper introduces Retrieval-Augmented Reinforcement Fine-Tuning (RA-RFT), a novel framework that enhances language models' reasoning capabilities by leveraging analogy-based retrieval rather than traditional semantic similarity. By employing gold-relevance distillation to prioritize contexts based on expected reasoning benefits, RA-RFT fine-tunes models using reinforcement learning with retrieved analogous demonstrations, leading to improved reasoning performance. The results show significant gains in mathematical reasoning benchmarks, with RA-RFT outperforming standard methods and indicating that reasoning-aware retrieval offers a unique avenue for enhancing model performance.
Reasoning-aware retrieval can boost language model performance by surfacing diverse solution strategies that traditional methods overlook.
Retrieval-augmented generation (RAG) has become a standard mechanism for grounding language models in external knowledge, yet conventional retrieval based on lexical or semantic similarity is poorly suited for complex reasoning tasks: a semantically similar problem may demand an entirely different solution strategy, while a superficially different problem may share the same underlying reasoning pattern. We propose Retrieval-Augmented Reinforcement Fine-Tuning (RA-RFT), a post-training framework that teaches language models to reason by analogy. RA-RFT uses gold-relevance distillation to train a retriever that ranks contexts by expected reasoning benefit rather than semantic overlap, and then fine-tunes the policy model via reinforcement fine-tuning methods with retrieved analogous demonstrations, so the model learns to leverage reasoning traces under verifiable outcome rewards. We further analyze the diversity of retrieved contexts and find that reasoning-aware retrieval surfaces complementary solution strategies that provide distinct reasoning scaffolds for individual problems. Across challenging mathematical reasoning benchmarks, RA-RFT consistently outperforms standard reinforcement fine-tuning methods. For example, it improves AIME 2025 average@32 accuracy by 7.1 and 2.8 points over GRPO for Qwen3-1.7B and Qwen3-4B respectively -- suggesting that reasoning-aware retrieval is a complementary axis of improvement and orthogonal to advances in reward design or training curricula.