Search papers, labs, and topics across Lattice.
This paper introduces a fully open-source pipeline for learning illumination control in diffusion models by finetuning on a dataset of paired poorly-lit/well-lit images with natural language lighting instructions. The data engine transforms well-lit images into supervised training triplets, enabling the diffusion model to learn to adjust lighting based on textual prompts. Finetuning on this dataset leads to significant improvements in perceptual similarity, structural similarity, and identity preservation compared to baseline SD 1.5, SDXL, and FLUX.1-dev models.
Open-source diffusion models can now achieve state-of-the-art illumination control rivaling closed-source alternatives, thanks to a novel training pipeline and dataset.
Controlling illumination in images is essential for photography and visual content creation. While closed-source models have demonstrated impressive illumination control, open-source alternatives either require heavy control inputs like depth maps or do not release their data and code. We present a fully open-source and reproducible pipeline for learning illumination control in diffusion models. Our approach builds a data engine that transforms well-lit images into supervised training triplets consisting of a poorly-illuminated input image, a natural language lighting instruction, and a well-illuminated output image. We finetune a diffusion model on this data and demonstrate significant improvements over baseline SD 1.5, SDXL, and FLUX.1-dev models in perceptual similarity, structural similarity, and identity preservation. Our work provides a reproducible solution built entirely with open-source tools and publicly available data. We release all our code, data, and model weights publicly.