Mar 19, 2026arXiv:2603.19231

MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction

Haitian Li, Haozhe Xie, Haozhe Xie, Junxiang Xu, Junxiang Xu, Beichen Wen, Beichen Wen, Fangzhou Hong, Fangzhou Hong, Ziwei Liu, Ziwei Liu

AI Summary

MonoArt is introduced, a novel framework for monocular articulated 3D reconstruction that disentangles motion cues and object structure through progressive structural reasoning. The method transforms visual observations into canonical geometry, structured part representations, and motion-aware embeddings within a single architecture, enabling stable articulation inference. Experiments on PartNet-Mobility show state-of-the-art performance in reconstruction accuracy and inference speed, with generalization to robotic manipulation and articulated scene reconstruction.

Key Contribution

Unlock real-time 3D understanding: MonoArt achieves state-of-the-art monocular articulated object reconstruction without relying on multi-view data or external motion templates.

Abstract

Reconstructing articulated 3D objects from a single image requires jointly inferring object geometry, part structure, and motion parameters from limited visual evidence. A key difficulty lies in the entanglement between motion cues and object structure, which makes direct articulation regression unstable. Existing methods address this challenge through multi-view supervision, retrieval-based assembly, or auxiliary video generation, often sacrificing scalability or efficiency. We present MonoArt, a unified framework grounded in progressive structural reasoning. Rather than predicting articulation directly from image features, MonoArt progressively transforms visual observations into canonical geometry, structured part representations, and motion-aware embeddings within a single architecture. This structured reasoning process enables stable and interpretable articulation inference without external motion templates or multi-stage pipelines. Extensive experiments on PartNet-Mobility demonstrate that OM achieves state-of-the-art performance in both reconstruction accuracy and inference speed. The framework further generalizes to robotic manipulation and articulated scene reconstruction.

Computer Vision Reasoning & Chain-of-Thought Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References63

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction

Related Papers