Tsinghua AIAcademyCASHKUUSTCJun 8, 2026arXiv:2606.09187

CP4D: Compositional Physics-aware 4D Scene Generation

Hanxin Zhu, Cong Wang, Tianyu He, Long Chen, Xin Jin, Chen Gao, Zhibo Chen

AI Summary

CP4D introduces a novel approach to 4D scene generation by integrating static 3D environments with dynamic, physically grounded objects, addressing the limitations of existing methods that often produce visually implausible results. The framework employs a three-stage pipeline that utilizes pre-trained models for high-fidelity representations, a hybrid motion synthesis strategy combining physical simulation and video diffusion models for realistic object interactions, and an automated composition mechanism for coherent scene integration. Experimental results show that CP4D achieves superior visual fidelity and physical plausibility compared to prior techniques, enabling the creation of interactive 4D scenes with fine-grained controllability.

Key Contribution

CP4D achieves photorealistic 4D scene generation by seamlessly integrating static environments with dynamic objects, outperforming existing methods in visual fidelity and physical consistency.

Abstract

4D generation (\textit{i.e.}, dynamic 3D generation) has recently emerged as a rapidly growing research frontier due to its powerful spatiotemporal modeling capabilities. However, despite notable advances, existing approaches typically fail to capture the underlying physical principles, producing results that are both physically inconsistent and visually implausible. To overcome this limitation, we present CP4D, a novel paradigm for photorealistic 4D scene synthesis with faithful adherence to complex physical dynamics. Drawing inspiration from the compositional nature of real-world scenes, where immutable static backgrounds coexist with dynamic, physically plausible foregrounds, CP4D reformulates 4D generation as the integration of a static 3D environment with physically grounded dynamic objects. On this basis, our framework follows a three-stage pipeline: \textbf{1)} Firstly, we leverage pre-trained expert models to generate high-fidelity 3D representations of the environment and foreground objects respectively. \textbf{2)} Subsequently, to produce physically plausible trajectories and realistic interactions for these objects, we propose a hybrid motion synthesis strategy that integrates priors from physical simulators with the common sense embedded in video diffusion models. \textbf{3)} Finally, we develop an automated composition mechanism that seamlessly fuses the static environment and dynamic objects into coherent, physically consistent 4D scenes. Extensive experiments demonstrate that CP4D can generate explorable and interactive 4D scenes with high visual fidelity, strong physical plausibility, and fine-grained controllability, significantly outperforming existing methods. The project page: https://anonymous.4open.science/w/CP4D/.

Computer Vision World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

CP4D: Compositional Physics-aware 4D Scene Generation

Related Papers