Search papers, labs, and topics across Lattice.
The paper introduces SkeleGuide, a framework for context-aware human-in-place image synthesis that explicitly reasons about human skeletal structure to improve structural integrity. SkeleGuide jointly trains reasoning and rendering stages to produce an internal pose representation, acting as a structural prior. The authors also propose PoseInverter, a module to decode the internal latent pose into an editable format, enabling fine-grained user control.
Generating humans that look right in a scene just got easier: SkeleGuide uses explicit skeletal reasoning to avoid distorted limbs and unnatural poses, outperforming both specialized and general-purpose models.
Generating realistic and structurally plausible human images into existing scenes remains a significant challenge for current generative models, which often produce artifacts like distorted limbs and unnatural poses. We attribute this systemic failure to an inability to perform explicit reasoning over human skeletal structure. To address this, we introduce SkeleGuide, a novel framework built upon explicit skeletal reasoning. Through joint training of its reasoning and rendering stages, SkeleGuide learns to produce an internal pose that acts as a strong structural prior, guiding the synthesis towards high structural integrity. For fine-grained user control, we introduce PoseInverter, a module that decodes this internal latent pose into an explicit and editable format. Extensive experiments demonstrate that SkeleGuide significantly outperforms both specialized and general-purpose models in generating high-fidelity, contextually-aware human images. Our work provides compelling evidence that explicitly modeling skeletal structure is a fundamental step towards robust and plausible human image synthesis.