Search papers, labs, and topics across Lattice.
This paper introduces FadeMem, a distance-aware memory consolidation mechanism for autoregressive video diffusion that optimizes the use of historical KV caches by organizing them into a temporal hierarchy. By leveraging frequency-dependent temporal decay, FadeMem effectively merges older memory entries while maintaining recent context, resulting in a more efficient memory structure under a fixed cache budget. Experimental results demonstrate that FadeMem significantly enhances subject consistency, background stability, and temporal coherence compared to existing bounded-cache methods.
Memory consolidation that adapts to temporal relevance could revolutionize how autoregressive video generators maintain coherence over long sequences.
Autoregressive video generators synthesize long videos by generating successive temporal segments, but their historical KV cache grows with video length. Existing bounded-cache methods reduce this cost with local windows, sink tokens, or compressed memory states, yet they usually assign fixed roles to different parts of the history. We propose FadeMem, a distance-aware KV memory consolidation mechanism that organizes historical KV blocks into a temporal hierarchy under a fixed cache budget. This design is motivated by frequency-dependent temporal decay: fine details decorrelate quickly, while coarse scene structure and identity remain useful over longer horizons. During generation, new history is inserted as fine-grained entries, while older adjacent entries are progressively merged under a power-law temporal allocation schedule, yielding a dense-near, sparse-far memory within one cache. Without architectural changes, FadeMem preserves recent context for short-term dynamics and compact long-range anchors for identity and scene coherence. Experiments show improved subject consistency, background stability, and temporal coherence over existing bounded-cache strategies.