Mar 31, 2026arXiv:2603.29193

Developing Adaptive Context Compression Techniques for Large Language Models (LLMs) in Long-Running Interactions

AI Summary

This paper introduces an adaptive context compression framework for LLMs that dynamically manages context length by integrating importance-aware memory selection, coherence-sensitive filtering, and dynamic budget allocation. The framework is evaluated on LOCOMO, LOCCO, and LongBench, showing improvements in conversational stability and retrieval accuracy. The approach also reduces token usage and inference latency compared to existing methods, demonstrating a better balance between memory preservation and computational efficiency.

Key Contribution

LLMs can maintain conversational stability and improve retrieval accuracy in long-running interactions by adaptively compressing context, leading to reduced token usage and faster inference.

Abstract

Large Language Models (LLMs) often experience performance degradation during long-running interactions due to increasing context length, memory saturation, and computational overhead. This paper presents an adaptive context compression framework that integrates importance-aware memory selection, coherence-sensitive filtering, and dynamic budget allocation to retain essential conversational information while controlling context growth. The approach is evaluated on LOCOMO, LOCCO, and LongBench benchmarks to assess answer quality, retrieval accuracy, coherence preservation, and efficiency. Experimental results demonstrate that the proposed method achieves consistent improvements in conversational stability and retrieval performance while reducing token usage and inference latency compared with existing memory and compression-based approaches. These findings indicate that adaptive context compression provides an effective balance between long-term memory preservation and computational efficiency in persistent LLM interactions

Architecture Design (Transformers, SSMs, MoE)Eval Frameworks & Benchmarks Inference & Quantization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Developing Adaptive Context Compression Techniques for Large Language Models (LLMs) in Long-Running Interactions

Related Papers