Tsinghua AIOct 20, 2025arXiv:2510.17281

MemoryBench: A Benchmark for Memory and Continual Learning in LLM Systems

Qingyao Ai, Yichen Tang, Changyue Wang, Jianming Long, Weihang Su, Yiqun Liu

AI Summary

The paper introduces MemoryBench, a new benchmark designed to evaluate the continual learning capabilities of LLM systems by simulating user feedback across diverse domains, languages, and task types. Unlike existing benchmarks that focus on homogeneous reading comprehension, MemoryBench assesses the ability of LLMs to learn and adapt from accumulated user interactions during service. Experimental results using MemoryBench reveal that current state-of-the-art baselines struggle to effectively and efficiently learn in this continual learning setting.

Key Contribution

LLMs still struggle to learn effectively from user feedback during service, as revealed by a new benchmark spanning multiple domains and languages.

Abstract

Scaling up data, parameters, and test-time computation has been the mainstream methods to improve LLM systems (LLMsys), but their upper bounds are almost reached due to the gradual depletion of high-quality data and marginal gains obtained from larger computational resource consumption. Inspired by the abilities of human and traditional AI systems in learning from practice, constructing memory and continual learning frameworks for LLMsys has become an important and popular research direction in recent literature. Yet, existing benchmarks for LLM memory often focus on evaluating the system on homogeneous reading comprehension tasks with long-form inputs rather than testing their abilities to learn from accumulated user feedback in service time. Therefore, we propose a user feedback simulation framework and a comprehensive benchmark covering multiple domains, languages, and types of tasks to evaluate the continual learning abilities of LLMsys. Experiments show that the effectiveness and efficiency of state-of-the-art baselines are far from satisfying, and we hope this benchmark could pave the way for future studies on LLM memory and optimization algorithms.

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations11

Influential citations3

References64

Year2025

VenuearXiv.org

Related Papers

Finding related papers...

Search

MemoryBench: A Benchmark for Memory and Continual Learning in LLM Systems

Related Papers