Tsinghua AIBeihangJun 1, 2026arXiv:2606.02221

CORE-MTL: Rethinking Gradient Balancing via Causal Orthogonal Representations

Chengfeng Wu, Tao Zou, Yanru Wu, Jingge Wang

AI Summary

This paper introduces CORE-MTL, a novel framework for multi-task learning that utilizes Causal Orthogonal Representations to effectively disentangle task-relevant structures from spurious contexts in shared representations. By focusing on a structured semantic-residual factorization, the method enhances generalization and reduces negative transfer, outperforming traditional optimization-centric approaches. Empirical results demonstrate significant improvements on visual multi-task benchmarks, both in-distribution and out-of-distribution, with a theoretically stronger generalization bound.

Key Contribution

CORE-MTL achieves superior generalization by separating task-relevant information from noise, outperforming traditional methods in multi-task learning benchmarks.

Abstract

Multi-task learning (MTL) aims to construct a joint model for multiple tasks by sharing a common representation across domains. To achieve this goal, existing optimization-centric methods either balance task gradients or modify the shared architecture. However, as these approaches remain agnostic to the content of the shared representation, they fail to disentangle task-relevant structure from spurious context, leading to negative transfer and poor generalization. To overcome this limitation, we propose Causal Orthogonal Representations for Multi-Task Learning (CORE-MTL), a causally motivated representation-centric framework that encourages a structured semantic-residual factorization of the shared representation, concentrating task-relevant structure in the semantic stream while relegating nuisance variation to the residual stream. We instantiate this framework in the visual domain by leveraging physical priors for structured scenes and statistical constraints for attributes. Theoretically, our method enjoys a tighter out-of-distribution generalization bound than optimization-centric methods and reduces task gradient interference without explicit gradient projection or reweighting. Empirically, CORE-MTL consistently outperforms existing methods on visual multi-task benchmarks in both in-distribution and out-of-distribution settings. Code is publicly available at https://github.com/Hope-Rita/CORE-MTL.

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

CORE-MTL: Rethinking Gradient Balancing via Causal Orthogonal Representations

Related Papers