Georgia TechJun 15, 2026arXiv:2606.17034

KVEraser: Learning to Steer KV Cache for Efficient Localized Context Erasing

Mufei Li, Shikun Liu, Dongqi Fu, Haoyu Wang, Yinglong Xia, Hong Li, Hong Yan, Pan Li

AI Summary

KVEraser introduces a novel approach to efficiently erase spans from the KV cache in long-context LLMs, addressing the challenge that local edits can have global repercussions. By replacing only the KV states of the erased interval with learned steering states, KVEraser significantly reduces the computational cost associated with context erasure, which traditionally scales with suffix length. Experimental results demonstrate that KVEraser achieves nearly equivalent performance to full recomputation while incurring only a 24% increase in latency, outperforming approximate baselines in unseen long-document QA tasks.

Key Contribution

KVEraser achieves near-perfect context erasure efficiency with only a fraction of the latency increase compared to traditional methods, revolutionizing how we handle stale information in LLMs.

Abstract

Post-hoc context erasing over the KV cache is challenging because a local edit has a global consequence: once a span has been processed, its influence propagates into the cached states of all subsequent tokens. This issue arises naturally in long-context LLM applications, where stale retrieved facts, incorrect tool observations, retracted user preferences, or harmful prompt injections may be identified only after prefill. Exact erasing must then recompute all tokens after the deleted span, making its computational cost depend on suffix length rather than erased-span length. We introduce KVEraser, a learned KV-cache editing method for efficient localized context erasing. Given a processed context and a span to remove, KVEraser replaces only the KV states of the erased interval with learned steering states while reusing the remaining cache unchanged. To learn a transferable erasing mechanism, we build a two-stage training pipeline: generic span-neighbor pre-training teaches the eraser to suppress the influence of the erased span, while task-specific fine-tuning adapts this capability to downstream scenarios. Experiments show that KVEraser nearly matches full recomputation in post-erasure performance on in-domain tasks across 1K--32K context lengths, while its latency increases by only 24% compared with a 17.6x increase for full recomputation. KVEraser also generalizes to unseen long-document QA tasks with harmful factual distractors, achieving the best performance among approximate baselines with a 3--4x speedup over full recomputation.

Inference & Quantization Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

KVEraser: Learning to Steer KV Cache for Efficient Localized Context Erasing

Related Papers