HKUKuaishouMay 29, 2026arXiv:2605.31336

DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory

Zhenhao Yang, Xiaoshi Wu, Zhengyao Lv, Xiaoyu Shi, Xintao Wang, Pengfei Wan, Kun Gai, Kwan-Yee K. Wong

AI Summary

The paper introduces DecMem, a decoupled memory architecture for video generation that addresses the limitations of computational inefficiency and attention dispersion in long-horizon extrapolation. DecMem uses Sparse Global Memory for efficient access to global history and Anchored Local Memory for stable, high-quality extrapolation. Experiments show DecMem significantly outperforms state-of-the-art methods, enabling minute-level controllable video generation with improved fidelity and consistency.

Key Contribution

Generate minute-long, consistent videos with a novel memory architecture that leapfrogs existing methods by decoupling global and local memory access.

Abstract

Recent advances in video generative models have promoted rapid progress in controllable world models. However, maintaining fine-grained spatio-temporal consistency under long-horizon reasoning remains a key challenge. In this work, we move beyond explicit 3D memory and coarse frame-level implicit modeling, and propose a fine-grained, learnable, and scalable memory for consistent world generation. We first identify two fundamental limitations of naïve learnable memory architectures in long-horizon extrapolation, namely computational inefficiency and attention dispersion. Through a systematic analysis of attention dispersion, we propose DecMem, a decoupled memory architecture that employs Sparse Global Memory for efficient fine-grained access to global history and Anchored Local Memory for stable and high-quality extrapolation. Extensive experiments demonstrate that DecMem significantly outperforms current state-of-the-art methods. By ensuring precise and efficient long-term memory and achieving superior extrapolation capabilities, DecMem enables minute-level controllable long video generation with high fidelity and consistency.

Architecture Design (Transformers, SSMs, MoE)Computer Vision World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory

Related Papers