Apr 18, 2026arXiv:2604.17078

Understanding and Enforcing Weight Disentanglement in Task Arithmetic

Shan Liu, Yuehan Yin, Lei Wang, Qi Fan, Yinghuan Shi, Wenbin Li, Yang Gao, Dacheng Tao

AI Summary

This paper investigates the theoretical underpinnings of task arithmetic, identifying Task-Feature Specialization (TFS) as a sufficient condition for weight disentanglement. They demonstrate that TFS leads to observable weight vector orthogonality, allowing them to propose OrthoReg, a regularization method that promotes weight disentanglement by enforcing orthogonality during fine-tuning. Experiments show that OrthoReg significantly improves the performance of task arithmetic methods.

Key Contribution

Task arithmetic works because models internally allocate distinct features to different tasks, and enforcing this specialization via orthogonality regularization unlocks even better editing.

Abstract

Task arithmetic provides an efficient, training-free way to edit pre-trained models, yet lacks a fundamental theoretical explanation for its success. The existing concept of ``weight disentanglement"describes the ideal outcome of non-interfering task composition but does not reveal its underlying cause. Crucially, what intrinsic properties of the pre-trained model ($\theta_0$) or the task vectors ($\tau_t$) enable this disentanglement remains underexplored. In this paper, we introduce Task-Feature Specialization (TFS), a model's ability to allocate distinct internal features to different tasks, as the fundamental principle. We first prove that TFS is a sufficient condition for weight disentanglement. More importantly, we find that TFS also gives rise to an observable geometric consequence: weight vector orthogonality. This positions TFS as the common cause for both the desired functional outcome (disentanglement) and a measurable geometric property (orthogonality). This relationship provides the key insight for our method: since the abstract TFS property is intractable to enforce directly, we can instead promote weight disentanglement by shaping its concrete geometric consequence, orthogonality. Therefore, we propose OrthoReg, a simple and effective regularization method that actively enforces an internal orthogonal structure on weight updates ($\Delta W$) that constitute $\tau_t$ during fine-tuning. And we theoretically prove that OrthoReg promotes disentanglement. Extensive experiments demonstrate that OrthoReg consistently and significantly enhances the performance of various task arithmetic methods. Code is available at \href{https://github.com/RL-MIND/OrthoReg}{https://github.com/RL-MIND/OrthoReg}.

Architecture Design (Transformers, SSMs, MoE)Interpretability & Mechanistic Interp Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References57

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Understanding and Enforcing Weight Disentanglement in Task Arithmetic

Related Papers