CUHKMar 12, 2026arXiv:2603.11606

Articulat3D: Reconstructing Articulated Digital Twins From Monocular Videos with Geometric and Motion Constraints

Lijun Guo, Haoyu Zhao, Xingyue Zhao, Rong-hua Fu, Linghao Zhuang, Siteng Huang, Zhongyu Li, Hua Zou

AI Summary

Articulat3D reconstructs articulated 3D objects from monocular videos by jointly optimizing for geometric accuracy and motion coherence. It introduces Motion Prior-Driven Initialization to decompose the scene into rigidly moving groups using 3D point tracks and motion bases. The method then refines the reconstruction using Geometric and Motion Constraints Refinement, which enforces physically plausible articulation via learnable kinematic primitives.

Key Contribution

Reconstructing articulated 3D objects from casual monocular videos is now possible with Articulat3D, which enforces geometric and motion constraints for geometrically accurate and temporally coherent digital twins.

Abstract

Building high-fidelity digital twins of articulated objects from visual data remains a central challenge. Existing approaches depend on multi-view captures of the object in discrete, static states, which severely constrains their real-world scalability. In this paper, we introduce Articulat3D, a novel framework that constructs such digital twins from casually captured monocular videos by jointly enforcing explicit 3D geometric and motion constraints. We first propose Motion Prior-Driven Initialization, which leverages 3D point tracks to exploit the low-dimensional structure of articulated motion. By modeling scene dynamics with a compact set of motion bases, we facilitate soft decomposition of the scene into multiple rigidly-moving groups. Building on this initialization, we introduce Geometric and Motion Constraints Refinement, which enforces physically plausible articulation through learnable kinematic primitives parameterized by a joint axis, a pivot point, and per-frame motion scalars, yielding reconstructions that are both geometrically accurate and temporally coherent. Extensive experiments demonstrate that Articulat3D achieves state-of-the-art performance on synthetic benchmarks and real-world casually captured monocular videos, significantly advancing the feasibility of digital twin creation under uncontrolled real-world conditions. Our project page is at https://maxwell-zhao.github.io/Articulat3D.

Computer Vision Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References45

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Articulat3D: Reconstructing Articulated Digital Twins From Monocular Videos with Geometric and Motion Constraints

Related Papers