Search papers, labs, and topics across Lattice.
The paper introduces CLAG, a novel memory framework for small language model (SLM) agents that organizes memories into semantically coherent clusters. An SLM-driven router assigns incoming memories to clusters and generates cluster-specific profiles, enabling localized memory evolution and reducing cross-topic interference. Experiments on QA datasets demonstrate that CLAG enhances answer quality and robustness compared to existing memory systems, while remaining lightweight and efficient for SLMs.
Small language models can achieve surprisingly robust question answering by actively clustering their memories into semantically coherent groups, outperforming standard retrieval methods.
Large language model agents heavily rely on external memory to support knowledge reuse and complex reasoning tasks. Yet most memory systems store experiences in a single global retrieval pool which can gradually dilute or corrupt stored knowledge. This problem is especially pronounced for small language models (SLMs), which are highly vulnerable to irrelevant context. We introduce CLAG, a CLustering-based AGentic memory framework where an SLM agent actively organizes memory by clustering. CLAG employs an SLM-driven router to assign incoming memories to semantically coherent clusters and autonomously generates cluster-specific profiles, including topic summaries and descriptive tags, to establish each cluster as a self-contained functional unit. By performing localized evolution within these structured neighborhoods, CLAG effectively reduces cross-topic interference and enhances internal memory density. During retrieval, the framework utilizes a two-stage process that first filters relevant clusters via their profiles, thereby excluding distractors and reducing the search space. Experiments on multiple QA datasets with three SLM backbones show that CLAG consistently improves answer quality and robustness over prior memory systems for agents, remaining lightweight and efficient.