Search papers, labs, and topics across Lattice.
The paper introduces Multi-Object Generative Perception (MultiGP), a generative inverse rendering method that stochastically samples reflectance, texture, and illumination from a single image by exploiting the shared illumination across multiple objects. MultiGP uses a cascaded architecture with coordinated diffusion guidance, axial attention for inter-object communication, and a Texture Extraction ControlNet to disentangle radiometric components. Experiments show MultiGP effectively recovers individual object properties and shared illumination by leveraging the complementary spatial and frequency characteristics of multiple objects.
Radiometric disentanglement from a single image becomes tractable by exploiting the shared illumination constraint across multiple objects, enabling stochastic sampling of reflectance, texture, and illumination.
We introduce Multi-Object Generative Perception (MultiGP), a generative inverse rendering method for stochastic sampling of all radiometric constituents -- reflectance, texture, and illumination -- underlying object appearance from a single image. Our key idea to solve this inherently ambiguous radiometric disentanglement is to leverage the fact that while their texture and reflectance may differ, objects in the same scene are all lit by the same illumination. MultiGP exploits this consensus to produce samples of reflectance, texture, and illumination from a single image of known shapes based on four key technical contributions: a cascaded end-to-end architecture that combines image-space and angular-space disentanglement; Coordinated Guidance for diffusion convergence to a single consistent illumination estimate; Axial Attention applied to facilitate ``cross-talk''between objects of different reflectance; and a Texture Extraction ControlNet to preserve high-frequency texture details while ensuring decoupling from estimated lighting. Experimental results demonstrate that MultiGP effectively leverages the complementary spatial and frequency characteristics of multiple object appearances to recover individual texture and reflectance as well as the common illumination.