D geometry and aspect ratio of the subject.D map that represents a distortion-freeApr 22, 2026arXiv:2604.20715

GeoRelight: Learning Joint Geometrical Relighting and Reconstruction with Flexible Multi-Modal Diffusion Transformers

Yuxuan Xue, Ruofan Liang, Egor Zakharov, Timur Bagautdinov, Chen Cao, Giljoo Nam, Shunsuke Saito, Gerard Pons-Moll, Javier Romero

AI Summary

The paper introduces GeoRelight, a Multi-Modal Diffusion Transformer (DiT) that jointly estimates 3D geometry and relights a person from a single image. They propose an isotropic NDC-Orthographic Depth (iNOD) representation to enable distortion-free 3D reasoning within a latent diffusion model. By training on a mixture of synthetic and auto-labeled real data, GeoRelight achieves state-of-the-art performance in both relighting quality and geometric accuracy compared to methods that treat these tasks separately.

Key Contribution

Jointly modeling 3D geometry and relighting in a diffusion framework unlocks physically plausible single-image relighting that surpasses previous pipeline-based or geometry-agnostic approaches.

Abstract

Relighting a person from a single photo is an attractive but ill-posed task, as a 2D image ambiguously entangles 3D geometry, intrinsic appearance, and illumination. Current methods either use sequential pipelines that suffer from error accumulation, or they do not explicitly leverage 3D geometry during relighting, which limits physical consistency. Since relighting and estimation of 3D geometry are mutually beneficial tasks, we propose a unified Multi-Modal Diffusion Transformer (DiT) that jointly solves for both: GeoRelight. We make this possible through two key technical contributions: isotropic NDC-Orthographic Depth (iNOD), a distortion-free 3D representation compatible with latent diffusion models; and a strategic mixed-data training method that combines synthetic and auto-labeled real data. By solving geometry and relighting jointly, GeoRelight achieves better performance than both sequential models and previous systems that ignored geometry.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

GeoRelight: Learning Joint Geometrical Relighting and Reconstruction with Flexible Multi-Modal Diffusion Transformers

Related Papers