K (LPIPSSJTUMar 16, 2026arXiv:2603.15129

Next-Frame Decoding for Ultra-Low-Bitrate Image Compression with Video Diffusion Priors

Yunuo Chen, Chuqin Zhou, Jiangchuan Li, Xiaoyue Ling, Bing He, Jincheng Dai, Li Song, Guo Lu

AI Summary

This paper introduces a novel ultra-low-bitrate image compression (ULB-IC) method that leverages a pretrained video diffusion model (VDM) to model the transition from a compact "anchor frame" to the final reconstructed image. By reinterpreting generative decoding as a next-frame prediction task conditioned on the anchor frame, the method improves fidelity and realism compared to image diffusion-based approaches. Experiments on CLIC2020 demonstrate over 50% bitrate savings across LPIPS, DISTS, FID, and KID compared to DiffC, along with a 5x decoding speedup.

Key Contribution

Achieve 50% bitrate savings in ultra-low-bitrate image compression by cleverly turning image decoding into a next-frame prediction problem using video diffusion priors.

Abstract

We present a novel paradigm for ultra-low-bitrate image compression (ULB-IC) that exploits the ``temporal'' evolution in generative image compression. Specifically, we define an explicit intermediate state during decoding: a compact anchor frame, which preserves the scene geometry and semantic layout while discarding high-frequency details. We then reinterpret generative decoding as a virtual temporal transition from this anchor to the final reconstructed image.To model this progression, we leverage a pretrained video diffusion model (VDM) as temporal priors: the anchor frame serves as the initial frame and the original image as the target frame, transforming the decoding process into a next-frame prediction task.In contrast to image diffusion-based ULB-IC models, our decoding proceeds from a visible, semantically faithful anchor, which improves both fidelity and realism for perceptual image compression. Extensive experiments demonstrate that our method achieves superior objective and subjective performance. On the CLIC2020 test set, our method achieves over \textbf{50\% bitrate savings} across LPIPS, DISTS, FID, and KID compared to DiffC, while also delivering a significant decoding speedup of up to $\times$5. Code will be released later.

Computer Vision Inference & Quantization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Next-Frame Decoding for Ultra-Low-Bitrate Image Compression with Video Diffusion Priors

Related Papers