CMU MLColumbiaHarvardLLNLPrincetonUMDMay 25, 2026arXiv:2605.26099

Language Models Need Sleep

Sangyun Lee, Sean McLeish, Tom Goldstein, Giulia Fanti

AI Summary

The paper introduces a sleep-like consolidation mechanism for LLMs to address the poor scaling of attention with context length. This method periodically converts recent context into persistent fast weights within the SSM blocks of the model via offline recurrent passes during a "sleep" phase. Results show improved performance on synthetic tasks and a math reasoning task where standard transformers and SSM-attention hybrids fail, with longer sleep durations leading to better performance on complex reasoning.

Key Contribution

LLMs can leverage "sleep" to distill long contexts into fast weights, unlocking superior reasoning without sacrificing inference latency.

Abstract

Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its key-value cache. During sleep, the model performs N offline recurrent passes over the accumulated context and updates the fast weights in its state-space model (SSM) blocks through a learned local rule. During inference, this shifts extra computation to sleep while preserving the latency of wake-time prediction. We test our method on controlled synthetic tasks, including cellular automata and multi-hop graph retrieval, as well as a realistic math reasoning task, on which a regular transformer as well as SSM-attention hybrid models fail. We then show that increasing sleep duration N for our models improves performance, with the largest gains on examples that require deeper reasoning.

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Language Models Need Sleep

Related Papers