Search papers, labs, and topics across Lattice.
This paper benchmarks catastrophic forgetting mitigation strategies for continual intent classification using the CLINC150 dataset in a 10-task label-disjoint setting. They evaluate ANN, GRU, and Transformer architectures with replay-based (MIR), regularization-based (LwF), and parameter-isolation (HAT) continual learning methods, both individually and in combination. Results show that replay-based methods, especially MIR, are crucial for mitigating forgetting, and that the optimal CL configuration is architecture-dependent, sometimes even surpassing joint training performance.
Naive fine-tuning leads to catastrophic forgetting, but combining replay-based and parameter isolation strategies can actually *improve* performance over joint training in continual learning for intent classification.
Neural language models deployed in real-world applications must continually adapt to new tasks and domains without forgetting previously acquired knowledge. This work presents a comparative empirical study of catastrophic forgetting mitigation in continual intent classification. Using the CLINC150 dataset, we construct a 10-task label-disjoint scenario and evaluate three backbone architectures: a feed-forward Artificial Neural Network (ANN), a Gated Recurrent Unit (GRU), and a Transformer encoder, under a range of continual learning (CL) strategies. We consider one representative method from each major CL family: replay-based Maximally Interfered Retrieval (MIR), regularization-based Learning without Forgetting (LwF), and parameter-isolation via Hard Attention to Task (HAT), both individually and in all pairwise and triple combinations. Performance is assessed with average accuracy, macro F1, and backward transfer, capturing the stability-plasticity trade-off across the task sequence. Our results show that naive sequential fine-tuning suffers from severe forgetting for all architectures and that no single CL method fully prevents it. Replay emerges as a key ingredient: MIR is the most reliable individual strategy, and combinations that include replay (MIR+HAT, MIR+LwF, MIR+LwF+HAT) consistently achieve high final performance with near-zero or mildly positive backward transfer. The optimal configuration is architecture-dependent. MIR+HAT yields the best result for ANN and Transformer, MIR+LwF+HAT, on the other hand, works the best for GRU, and in several cases CL methods even surpass joint training, indicating a regularization effect. These findings highlight the importance of jointly selecting backbone architecture and CL mechanism when designing continual intent-classification systems.