Search papers, labs, and topics across Lattice.
RawGen is a diffusion-based framework for text-to-raw image generation and sRGB-to-raw inversion, addressing the scarcity of large-scale raw datasets. It leverages sRGB diffusion model priors to synthesize physically meaningful linear outputs like CIE XYZ or camera-specific raw representations. The framework uses a many-to-one inverse-ISP dataset, anchoring multiple sRGB renditions of the same scene to a common scene-referred target, enabling camera-centric linear reconstructions and improved performance over traditional inverse-ISP methods.
Generate realistic raw camera images from text prompts, sidestepping the bottleneck of limited raw datasets and hardware constraints.
Cameras capture scene-referred linear raw images, which are processed by onboard image signal processors (ISPs) into display-referred 8-bit sRGB outputs. Although raw data is more faithful for low-level vision tasks, collecting large-scale raw datasets remains a major bottleneck, as existing datasets are limited and tied to specific camera hardware. Generative models offer a promising way to address this scarcity -- however, existing diffusion frameworks are designed to synthesize photo-finished sRGB images rather than physically consistent linear representations. This paper presents RawGen, to our knowledge the first diffusion-based framework enabling text-to-raw generation for arbitrary target cameras, alongside sRGB-to-raw inversion. RawGen leverages the generative priors of large-scale sRGB diffusion models to synthesize physically meaningful linear outputs, such as CIE XYZ or camera-specific raw representations, via specialized processing in latent and pixel spaces. To handle unknown and diverse ISP pipelines and photo-finishing effects in diffusion-model training data, we build a many-to-one inverse-ISP dataset where multiple sRGB renditions of the same scene generated using diverse ISP parameters are anchored to a common scene-referred target. Fine-tuning a conditional denoiser and specialized decoder on this dataset allows RawGen to obtain camera-centric linear reconstructions that effectively invert the rendering pipeline. We demonstrate RawGen's superior performance over traditional inverse-ISP methods that assume a fixed ISP. Furthermore, we show that augmenting training pipelines with RawGen's scalable, text-driven synthetic data can benefit downstream low-level vision tasks.