Search papers, labs, and topics across Lattice.
This paper introduces an open-source pipeline for retrofitting LLMs (specifically Qwen-2.5-0.5B) with sparse memory modules to enable continual learning. The key innovation is a KL-divergence-based slot selection mechanism for memory updates, prioritizing informationally "surprising" tokens. Experiments show that models retrofitted with this method acquire new factual knowledge with minimal forgetting, supporting the sparse update hypothesis.
Forget catastrophic forgetting: sparse memory finetuning, enhanced with a KL-divergence-based update rule, lets LLMs learn continuously without trashing old knowledge.
Large Language Models (LLMs) are typically static after training, yet real-world applications require continual adaptation to new knowledge without degrading existing capabilities. Standard approaches to updating models, like full finetuning or parameter-efficient methods (e.g., LoRA), face a fundamental trade-off: catastrophic forgetting. They modify shared dense representations, causing interference across tasks. Sparse Memory Finetuning (SMF) offers a promising alternative by localizing updates to a small subset of parameters in explicit memory layers. In this work, we present an open-source pipeline to retrofit existing pretrained models (Qwen-2.5-0.5B) with sparse memory modules, enabling effective continual learning on consumer hardware. We extend prior work by introducing a theoretically grounded slot-selection mechanism based on Kullback-Leibler (KL) divergence, which prioritizes memory updates for informationally"surprising"tokens relative to a background distribution. Our experiments demonstrate that our retrofitted models can acquire new factual knowledge with minimal forgetting of held-out capabilities, validating the sparse update hypothesis in a practical setting.