Search papers, labs, and topics across Lattice.
This paper introduces Local Classifier Alignment (LCA), a novel loss function designed to mitigate the mismatch between task-specific classifiers and the evolving feature backbone in continual learning scenarios. LCA aims to improve classifier generalization across observed tasks and enhance robustness by aligning the classifier with the backbone. The authors integrate LCA into a model merging approach and demonstrate state-of-the-art or near state-of-the-art performance on several continual learning benchmarks.
Stop catastrophic forgetting in continual learning by better aligning your classifiers to your feature backbone with a new loss function.
A fundamental requirement for intelligent systems is the ability to learn continuously under changing environments. However, models trained in this regime often suffer from catastrophic forgetting. Leveraging pre-trained models has recently emerged as a promising solution, since their generalized feature extractors enable faster and more robust adaptation. While some earlier works mitigate forgetting by fine-tuning only on the first task, this approach quickly deteriorates as the number of tasks grows and the data distributions diverge. More recent research instead seeks to consolidate task knowledge into a unified backbone, or adapting the backbone as new tasks arrive. However, such approaches may create a (potential) \textit{mismatch} between task-specific classifiers and the adapted backbone. To address this issue, we propose a novel \textit{Local Classifier Alignment} (LCA) loss to better align the classifier with backbone. Theoretically, we show that this LCA loss can enable the classifier to not only generalize well for all observed tasks, but also improve robustness. Furthermore, we develop a complete solution for continual learning, following the model merging approach and using LCA. Extensive experiments on several standard benchmarks demonstrate that our method often achieves leading performance, sometimes surpasses the state-of-the-art methods with a large margin.