Adobe ResearchPrincetonApr 21, 2026arXiv:2604.19976

Lucky High Dynamic Range Smartphone Imaging

Baiang Li, Ruyu Yan, Ethan Tseng, Zhoutong Zhang, Adam Finkelstein, Jiawen Chen, Felix Heide

AI Summary

This paper introduces a novel HDR imaging technique for smartphones that operates on bracketed exposures in linear raw pixel space. The method uses a lightweight, iterative inference architecture to produce HDR images by computing a convex combination of neighboring pixels, adjusted for exposure, thereby avoiding hallucination artifacts. Trained solely on synthetic data, the system demonstrates zero-shot generalization to real-world smartphone captures and outperforms other state-of-the-art methods, even improving their performance when used as a pre-training scheme.

Key Contribution

Achieve state-of-the-art HDR imaging on smartphones with a lightweight network that generalizes from synthetic training data to real-world bracketed exposures, all while avoiding common hallucination artifacts.

Abstract

While the human eye can perceive an impressive twenty stops of dynamic range, smartphone camera sensors remain limited to about twelve stops despite decades of research. A variety of high dynamic range (HDR) image capture and processing techniques have been proposed, and, in practice, they can extend the dynamic range by 3-5 stops for handheld photography. This paper proposes an approach that robustly captures dynamic range using a handheld smartphone camera and lightweight networks suitable for running on mobile devices. Our method operates indirectly on linear raw pixels in bracketed exposures. Every pixel in the final HDR image is a convex combination of input pixels in the neighborhood, adjusted for exposure, and thus avoids hallucination artifacts typical of recent deep image synthesis networks. We validate our system on both synthetic imagery and unseen real bracketed images -- we confirm zero-shot generalization of the method to smartphone camera captures. Our iterative inference architecture is capable of processing an arbitrary number of bracketed input photos, and we show examples from capture stacks containing 3--9 images. Our training process relies only on synthetic captures yet generalizes to unseen real photos from several cameras. Moreover, we show that this training scheme improves other SOTA methods over their pretrained counterparts.

Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Lucky High Dynamic Range Smartphone Imaging

Related Papers