D Mesh Initialization We extract aMay 27, 2026arXiv:2605.27938

SEMAGIC: Learning Semantically Consistent Deformable 3D Representations from In-the-Wild Images

Sky Cen, Wufei Ma, Alan L. Yuille, Adam Kortylewski

AI Summary

SEMAGIC is introduced to learn semantically consistent deformable 3D object models from single-view in-the-wild images by explicitly coupling geometric deformation with semantic alignment. The framework represents each category with a canonical template mesh and a learned deformation field, similar to an autoencoder, to reconstruct instance geometry from image features while maintaining consistent semantic meaning across vertices. Training incorporates a feature-level consistency loss and vertex-index-conditioned deformation to enforce semantic correspondence, leading to improved performance on semantic correspondence benchmarks.

Key Contribution

Deformable 3D reconstruction can be more than just visually plausible geometry; SEMAGIC shows it can also learn stable semantic correspondences, boosting performance by +14.7 PCK@0.1 on SPair-71k.

Abstract

Learning deformable 3D object models from single-view in-the-wild images has enabled impressive 3D shape reconstruction without supervision. However, it remains unclear whether these models capture the semantic structure required for downstream tasks. We find that existing deformable reconstruction approaches, despite producing visually plausible geometry, yield unstable correspondences across instances and perform poorly on semantic correspondence benchmarks. We introduce SEMAGIC, a framework for learning semantically consistent deformable 3D representations from single-view in-the-wild images. Rather than treating reconstruction as the end goal, SEMAGIC uses deformable modeling as a mechanism to discover category-level correspondences. Each category is represented by a canonical template mesh and a learned deformation field, functioning similarly to an autoencoder that reconstructs instance geometry from image features, enabling vertices to maintain consistent semantic meaning across instances. Semantic consistency is enforced during training through (i) a feature-level consistency loss aligning semantic features between canonical and deformed meshes, and (ii) vertex-index-conditioned deformation that preserves semantic correspondence across instances. By explicitly coupling geometric deformation with semantic alignment, SEMAGIC produces representations that maintain stable part correspondences across intra-category variation. Experiments demonstrate that SEMAGIC improves semantic correspondence of deformable models by +14.7 PCK@0.1 on SPair-71k, establishing deformable models as effective semantic 3D representations.

Computer Vision Multimodal Models Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References37

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SEMAGIC: Learning Semantically Consistent Deformable 3D Representations from In-the-Wild Images

Related Papers