Tsinghua AIMar 10, 2026arXiv:2603.10125

4DEquine: Disentangling Motion and Appearance for 4D Equine Reconstruction from Monocular Video

Jinmei Lyu, Liang An, Pujin Cheng, Yebin Liu, Xiaoying Tang

AI Summary

4DEquine disentangles 4D equine reconstruction from monocular video into dynamic motion and static appearance sub-problems. Motion is reconstructed using a spatio-temporal transformer with post-optimization, while appearance is modeled with a feed-forward network reconstructing a 3D Gaussian avatar. Trained on newly created synthetic datasets (VarenPoser and VarenTex), 4DEquine achieves state-of-the-art performance on real-world datasets, demonstrating effective generalization from synthetic to real data.

Key Contribution

Reconstructing horses in 4D from video is now faster and more accurate thanks to a new method that separates motion and appearance, and learns from synthetic data.

Abstract

4D reconstruction of equine family (e.g. horses) from monocular video is important for animal welfare. Previous mainstream 4D animal reconstruction methods require joint optimization of motion and appearance over a whole video, which is time-consuming and sensitive to incomplete observation. In this work, we propose a novel framework called 4DEquine by disentangling the 4D reconstruction problem into two sub-problems: dynamic motion reconstruction and static appearance reconstruction. For motion, we introduce a simple yet effective spatio-temporal transformer with a post-optimization stage to regress smooth and pixel-aligned pose and shape sequences from video. For appearance, we design a novel feed-forward network that reconstructs a high-fidelity, animatable 3D Gaussian avatar from as few as a single image. To assist training, we create a large-scale synthetic motion dataset, VarenPoser, which features high-quality surface motions and diverse camera trajectories, as well as a synthetic appearance dataset, VarenTex, comprising realistic multi-view images generated through multi-view diffusion. While training only on synthetic datasets, 4DEquine achieves state-of-the-art performance on real-world APT36K and AiM datasets, demonstrating the superiority of 4DEquine and our new datasets for both geometry and appearance reconstruction. Comprehensive ablation studies validate the effectiveness of both the motion and appearance reconstruction network. Project page: https://luoxue-star.github.io/4DEquine_Project_Page/.

Computer Vision Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References61

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

4DEquine: Disentangling Motion and Appearance for 4D Equine Reconstruction from Monocular Video

Related Papers