Apr 6, 2026arXiv:2604.05248

Improving Sparse Memory Finetuning

Satyam Goyal, Anirudh Kanchi, Garv Shah, Prakhar Gupta

AI Summary

This paper introduces an open-source pipeline for retrofitting LLMs (specifically Qwen-2.5-0.5B) with sparse memory modules to enable continual learning. The key innovation is a KL-divergence-based slot selection mechanism for memory updates, prioritizing informationally "surprising" tokens. Experiments show that models retrofitted with this method acquire new factual knowledge with minimal forgetting, supporting the sparse update hypothesis.

Key Contribution

Forget catastrophic forgetting: sparse memory finetuning, enhanced with a KL-divergence-based update rule, lets LLMs learn continuously without trashing old knowledge.

Abstract

Large Language Models (LLMs) are typically static after training, yet real-world applications require continual adaptation to new knowledge without degrading existing capabilities. Standard approaches to updating models, like full finetuning or parameter-efficient methods (e.g., LoRA), face a fundamental trade-off: catastrophic forgetting. They modify shared dense representations, causing interference across tasks. Sparse Memory Finetuning (SMF) offers a promising alternative by localizing updates to a small subset of parameters in explicit memory layers. In this work, we present an open-source pipeline to retrofit existing pretrained models (Qwen-2.5-0.5B) with sparse memory modules, enabling effective continual learning on consumer hardware. We extend prior work by introducing a theoretically grounded slot-selection mechanism based on Kullback-Leibler (KL) divergence, which prioritizes memory updates for informationally"surprising"tokens relative to a background distribution. Our experiments demonstrate that our retrofitted models can acquire new factual knowledge with minimal forgetting of held-out capabilities, validating the sparse update hypothesis in a practical setting.

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References9

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Improving Sparse Memory Finetuning

Related Papers