Search papers, labs, and topics across Lattice.
This paper tackles the challenge of inter-task interference in multi-task learning by introducing Essential Subspace Decomposition (ESD) and Essential Subspace Merging (ESM). By analyzing the output shifts from task updates, the authors identify that significant energy is concentrated in a limited number of principal directions, which can be leveraged to minimize interference during model merging. The proposed methods, ESM and its dynamic extension ESM++, effectively integrate task-specific knowledge while maintaining performance across various tasks and model scales.
Merging models for multi-task learning can be done without training, preserving task knowledge while minimizing interference through a focus on essential subspaces.
Model merging aims to enable multi-task learning by integrating the capabilities of multiple models fine-tuned from the same pre-trained checkpoint into a single model. Its core challenge is inter-task interference among task-specific parameter updates. In this paper, we analyze the output shifts induced by task updates and observe that their energy is concentrated in a small number of principal directions. We call the subspace spanned by these directions the essential subspace. In contrast, most remaining directions carry little task-relevant energy, but their accumulation across multiple task updates can cause severe interference during merging. Motivated by this observation, we propose Essential Subspace Decomposition (ESD), which decomposes each task update according to the principal components of its activation shift. Based on ESD, we introduce Essential Subspace Merging (ESM), a training-free static merging method that orthogonalizes and fuses essential components into one compact multi-task model. We further extend ESM to ESM++, a training-free dynamic merging method that decomposes task-specific residuals into low-rank experts and selects the most relevant expert through prototype-based routing during forward inference. Extensive experiments across multiple task sets and model scales demonstrate that ESM and ESM++ effectively preserves task knowledge while reducing inter-task interference.