Search papers, labs, and topics across Lattice.
This paper introduces SAMIC, a novel perceptual image compression method leveraging Mamba-based state space models for efficient long-range modeling. To address limitations of fixed scanning orders in existing methods, they propose a semantic-aware Mamba block (SAMB) that dynamically adjusts scanning based on clustered semantic features. They further introduce an SVD-inspired redundancy reduction module (SVD-RRM) to reduce channel-wise redundancy in the encoder, achieving state-of-the-art rate-distortion-perception tradeoff with lower model complexity.
Mamba's linear complexity meets perceptual image compression, yielding a lightweight model that rivals GANs and diffusion models in visual quality while being far more efficient.
Perceptual image compression focuses on preserving high visual quality under low-bitrate constraints. Most existing approaches to perceptual compression leverage the strong generative capabilities of generative adversarial networks or diffusion models, at the cost of substantial model complexity. To this end, we present an efficient perceptual image compression method that exploits the long-range modeling capability and linear computational complexity of state space models, with a particular focus on Mamba. Unlike existing methods that rely on an inherently fixed scanning order and consequently impair semantic continuity and spatial correlation, we develop a semantic-aware Mamba block (SAMB) to enable scanning guided by dynamically clustered semantic features, thereby alleviating the strict causality constraints and long-range information decay inherent to Mamba. Inspired by singular value decomposition, we design an SVD-inspired redundancy reduction module (SVD-RRM) that performs a low-rank approximation on the latent features by introducing a learnable soft threshold, leading to channel-wise redundancy information reduction. The proposed SAMB is integrated into both the encoder and decoder of the compression framework, whereas the SVD-RRM is incorporated only in the encoder. Extensive experiments demonstrate that our method performs favorably against state-of-the-art approaches in terms of rate-distortion-perception tradeoff and model complexity. The source code and pretrained models will be available at https://github.com/Jasmine-aiq/SAMIC.