Jun 16, 2026arXiv:2606.18250

Future Dynamic 3D Reconstruction: A 3D World Model with Disentangled Ego-Motion

Nils Morbitzer, Jonathan Evers, Artem Savkin, Thomas Stauner, Nassir Navab, Federico Tombari, Stefano Gasperini

AI Summary

This paper introduces FR3D, a novel world model that predicts a consistent 3D latent representation for dynamic environments by disentangling ego-motion from environmental dynamics. By treating ego-motion as a latent proxy for action, FR3D addresses the physical inconsistencies found in prior generative models, such as morphing or vanishing objects over time. Extensive experiments show that FR3D achieves robust zero-shot generalization for future dynamic 3D reconstruction from monocular observations, maintaining geometric consistency even 2 seconds into the future.

Key Contribution

Disentangling ego-motion from environmental dynamics allows FR3D to achieve unprecedented geometric consistency in future 3D reconstructions.

Abstract

Forecasting the evolution of dynamic environments is crucial for autonomous agents. While generative world models have recently achieved high photorealism in 2D video synthesis by mixing ego-motion and environmental dynamics within the image plane, they exhibit physical inconsistencies, such as morphing or vanishing objects, especially over long time horizons. In this paper, we propose FR3D, a world model that predicts a persistent 3D latent representation for future dynamic 3D reconstruction. Unlike prior works that treat the world as a sequence of image-based features, FR3D explicitly decouples the 3D evolution of the scene from the agent's trajectory, treating the inferred ego-motion as a latent proxy for action. This disentanglement resolves the ambiguities between self-motion and world-motion, ensuring geometric consistency into the future. Furthermore, we introduce a teacher-student distillation strategy that leverages the spatial "common sense" of off-the-shelf foundation models, leading to robust zero-shot generalization. Extensive experiments demonstrate FR3D's strong performance for future dynamic 3D reconstruction from monocular observations across multiple datasets, even 2 seconds into the future. Project page: https://fr3d-wm.github.io.

Computer Vision Tool Use & Agents World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Future Dynamic 3D Reconstruction: A 3D World Model with Disentangled Ego-Motion

Related Papers