Search papers, labs, and topics across Lattice.
KVEraser introduces a novel approach to efficiently erase spans from the KV cache in long-context LLMs, addressing the challenge that local edits can have global repercussions. By replacing only the KV states of the erased interval with learned steering states, KVEraser significantly reduces the computational cost associated with context erasure, which traditionally scales with suffix length. Experimental results demonstrate that KVEraser achieves nearly equivalent performance to full recomputation while incurring only a 24% increase in latency, outperforming approximate baselines in unseen long-document QA tasks.
KVEraser achieves near-perfect context erasure efficiency with only a fraction of the latency increase compared to traditional methods, revolutionizing how we handle stale information in LLMs.
Post-hoc context erasing over the KV cache is challenging because a local edit has a global consequence: once a span has been processed, its influence propagates into the cached states of all subsequent tokens. This issue arises naturally in long-context LLM applications, where stale retrieved facts, incorrect tool observations, retracted user preferences, or harmful prompt injections may be identified only after prefill. Exact erasing must then recompute all tokens after the deleted span, making its computational cost depend on suffix length rather than erased-span length. We introduce KVEraser, a learned KV-cache editing method for efficient localized context erasing. Given a processed context and a span to remove, KVEraser replaces only the KV states of the erased interval with learned steering states while reusing the remaining cache unchanged. To learn a transferable erasing mechanism, we build a two-stage training pipeline: generic span-neighbor pre-training teaches the eraser to suppress the influence of the erased span, while task-specific fine-tuning adapts this capability to downstream scenarios. Experiments show that KVEraser nearly matches full recomputation in post-erasure performance on in-domain tasks across 1K--32K context lengths, while its latency increases by only 24% compared with a 17.6x increase for full recomputation. KVEraser also generalizes to unseen long-document QA tasks with harmful factual distractors, achieving the best performance among approximate baselines with a 3--4x speedup over full recomputation.