May 6, 2026arXiv:2605.05148

What Matters in Practical Learned Image Compression

Kedar Tatwawadi, Parisa Rahimzadeh, Zhanghao Sun, Zhiqi Chen, Ziyun Yang, Sanjay Nair, Divija Hasteer, Oren Rippel

AI Summary

This paper investigates key modeling choices for learned image compression, focusing on perceptual quality and runtime efficiency. They perform neural architecture search over millions of backbones to identify models that balance on-device runtime with perceptual compression performance. The resulting codec achieves 2.3-3x bitrate savings against traditional codecs and 20-40% against learned alternatives, while maintaining fast encoding (230ms) and decoding (150ms) speeds on an iPhone 17 Pro Max.

Key Contribution

Learned image compression finally delivers on its promise: a codec that's not just perceptually superior, but also crushes traditional and learned alternatives in bitrate savings while running blazingly fast on mobile.

Abstract

One of the major differentiators unlocked by learned codecs relative to their hard-coded traditional counterparts is their ability to be optimized directly to appeal to the human visual system. Despite this potential, a perceptual yet practical image codec is yet to be proposed. In this work, we aim to close this gap. We conduct a comprehensive study of the key modeling choices that govern the design of a practical learned image codec, jointly optimized for perceptual quality and runtime -- including within the ablations several novel techniques. We then perform performance-aware neural architecture search over millions of backbone configurations to identify models that achieve the target on-device runtime while maximizing compression performance as captured by perceptual metrics. We combine the various optimizations to construct a new codec that achieves a significantly improved tradeoff between speed and perceptual quality. Based on rigorous subjective user studies, it provides 2.3-3x bitrate savings against AV1, AV2, VVC, ECM and JPEG-AI, and 20-40% bitrate savings against the best learned codec alternatives. At the same time, on an iPhone 17 Pro Max, it encodes 12MP images as fast as 230ms, and decodes them in 150ms -- faster than most top ML-based codecs run on a V100 GPU.

Computer Vision Inference & Quantization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

What Matters in Practical Learned Image Compression

Related Papers