Search papers, labs, and topics across Lattice.
This paper introduces Semi-Dynamic Context Compression, a method for compressing long context in LLMs by using a discrete ratio selector to predict and quantize compression ratios based on input information density. They address the instability of continuous, input-dependent structural hyperparameters by using a discrete set of compression ratios. Experiments demonstrate that this density-aware approach outperforms static compression baselines, achieving a better trade-off between compression and performance.
LLMs can maintain performance while processing longer contexts, thanks to a new compression method that intelligently adjusts the compression ratio based on the information density of the input.
Soft context compression reduces the computational workload of processing long contexts in LLMs by encoding long context into a smaller number of latent tokens. However, existing frameworks apply uniform compression ratios, failing to account for the extreme variance in natural language information density. While adopting a density-aware dynamic compression ratio seems intuitive, empirical investigations reveal that models struggle intrinsically with operations parameterized by input dependent, continuous structural hyperparameters. To resolve this pitfall, we introduce Semi-Dynamic Context Compression framework. Our approach features a Discrete Ratio Selector, which predicts a compression target based on intrinsic information density and quantizes it to a predefined set of discrete compression ratios. It is efficiently jointly trained with the compressor on synthetic data, with the summary lengths as a proxy to create labels for compression ratio prediction. Extensive evaluations confirm that our density-aware framework, utilizing mean pooling as the backbone, consistently outperforms static baselines, establishing a robust Pareto frontier for context compression techniques. Our code, data and model weights are available at https://github.com/yuyijiong/semi-dynamic-context-compress