Tsinghua AISJTUFeb 26, 2026arXiv:2602.22538

RAIN-Merging: A Gradient-Free Method to Enhance Instruction Following in Large Reasoning Models with Preserved Thinking Format

Zhehao Huang, Zhehao Huang, Yuhang Liu, Baijiong Lin, Baijiong Lin, Yixin Lou, Yixin Lou, Zhengbao He, Zhengbao He, Hanling Tian, Hanling Tian, Tao Li, Xiaolin Huang, Xiaolin Huang

AI Summary

The paper introduces RAIN-Merging, a gradient-free merging method to improve instruction following in large reasoning models (LRMs) without sacrificing reasoning performance. It addresses the output format mismatch between instruction-tuned models (ITMs) and LRMs by projecting the ITM task vector onto the null space of forward features at thinking tokens and using instruction attention to scale module-specific components. Experiments on instruction-following and reasoning benchmarks demonstrate that RAIN-Merging significantly enhances instruction adherence while preserving reasoning quality across various model scales and architectures.

Key Contribution

Instruction-following in large reasoning models gets a serious upgrade with RAIN-Merging, a gradient-free technique that merges in instruction-tuned capabilities without wrecking the model's ability to think step-by-step.

Abstract

Large reasoning models (LRMs) excel at a long chain of reasoning but often fail to faithfully follow instructions regarding output format, constraints, or specific requirements. We investigate whether this gap can be closed by integrating an instruction-tuned model (ITM) into an LRM. Analyzing their differences in parameter space, namely task vectors, we find that their principal subspaces are nearly orthogonal across key modules, suggesting a lightweight merging with minimal interference. However, we also demonstrate that naive merges are fragile because they overlook the output format mismatch between LRMs (with explicit thinking and response segments) and ITMs (answers-only). We introduce RAIN-Merging (Reasoning-Aware Instruction-attention guided Null-space projection Merging), a gradient-free method that integrates instruction following while preserving thinking format and reasoning performance. First, with a small reasoning calibration set, we project the ITM task vector onto the null space of forward features at thinking special tokens, which preserves the LRM's structured reasoning mechanisms. Second, using a small instruction calibration set, we estimate instruction attention to derive module-specific scaling that amplifies instruction-relevant components and suppresses leakage. Across four instruction-following benchmarks and nine reasoning&general capability benchmarks, RAIN-Merging substantially improves instruction adherence while maintaining reasoning quality. The gains are consistent across model scales and architectures, translating to improved performance in agent settings.

Architecture Design (Transformers, SSMs, MoE)Reasoning & Chain-of-Thought Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References71

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

RAIN-Merging: A Gradient-Free Method to Enhance Instruction Following in Large Reasoning Models with Preserved Thinking Format

Related Papers