Search papers, labs, and topics across Lattice.
This paper introduces ResMerge, a novel framework for merging reinforcement learning (RL) expert models by leveraging both leading spectral heads and residual components of task vectors. The authors demonstrate that while leading heads are informative, they are susceptible to conflicts, whereas residual components provide a more stable basis for merging. Experimental results show that ResMerge outperforms existing merging methods by better preserving the capabilities of the expert models across various RL tasks.
Merging RL experts effectively requires balancing sharp, informative signals with stable, dispersed components, a challenge that ResMerge addresses with innovative spectral techniques.
Model merging offers a training-free way to combine multiple post-trained expert models, but merging experts obtained through reinforcement learning (RL) remains challenging. Existing spectral merging methods often assume that leading singular directions contain the main task signal, while lower-energy residual components can be compressed, selected, or attenuated to reduce interference. We find that this assumption does not hold for RL task vectors: after decomposing each task vector into a leading spectral head and a residual component, both parts can independently recover substantial behavior knowledge, while exhibiting different merging properties. The head is highly concentrated and informative but more prone to sharp cross-expert conflicts, whereas the residual component is more dispersed and provides a more stable basis for aggregation. Based on this observation, we propose ResMerge, a residual-based spectral merging framework for RL experts. ResMerge first constructs a stable residual backbone with Spherical Residual Consensus Adaptation, which estimates a reliability-weighted consensus direction on the Frobenius sphere. It then reintroduces leading-head information through a Lightweight Head Correction module gated by positive cross-expert agreement. Experiments across multiple RL expert groups and capability domains show that ResMerge better preserves expert capabilities than representative task-vector and spectral merging baselines. The implementation of ResMerge is publicly available at https://github.com/sunyd0303-cpu/ResMerge-release.