Mar 8, 2026arXiv:2603.07552

ReconDrive: Fast Feed-Forward 4D Gaussian Splatting for Autonomous Driving Scene Reconstruction

Haibao Yu, Kuntao Xiao, Jiahang Wang, Ruiyang Hao, Yuxin Huang, Guoran Hu, Haifang Qin, Bowen Jing, Yuntian Bo, Ping Luo

AI Summary

ReconDrive is introduced as a feed-forward framework building upon the VGGT 3D foundation model to generate high-fidelity 4D Gaussian Splatting (4DGS) for autonomous driving scene reconstruction. It addresses limitations of existing per-scene optimization and feed-forward methods by using hybrid Gaussian prediction heads to decouple spatial and appearance regression, and a static-dynamic 4D composition strategy to model temporal motion. Experiments on nuScenes show ReconDrive achieves performance comparable to per-scene optimization but is significantly faster, improving reconstruction, novel-view synthesis, and 3D perception.

Key Contribution

Forget slow per-scene optimization: ReconDrive uses a fast feed-forward approach to generate high-fidelity 4D Gaussian Splatting for autonomous driving, rivaling optimization-based methods in quality while being orders of magnitude faster.

Abstract

High-fidelity visual reconstruction and novel-view synthesis are essential for realistic closed-loop evaluation in autonomous driving. While 4D Gaussian Splatting (4DGS) offers a promising balance of accuracy and efficiency, existing per-scene optimization methods require costly iterative refinement, rendering them unscalable for extensive urban environments. Conversely, current feed-forward approaches often suffer from degraded photometric quality. To address these limitations, we propose ReconDrive, a feed-forward framework that leverages and extends the 3D foundation model VGGT for rapid, high-fidelity 4DGS generation. Our architecture introduces two core adaptations to tailor the foundation model to dynamic driving scenes: (1) Hybrid Gaussian Prediction Heads, which decouple the regression of spatial coordinates and appearance attributes to overcome the photometric deficiencies inherent in generalized foundation features; and (2) a Static-Dynamic 4D Composition strategy that explicitly captures temporal motion via velocity modeling to represent complex dynamic environments. Benchmarked on nuScenes, ReconDrive significantly outperforms existing feed-forward baselines in reconstruction, novel-view synthesis, and 3D perception. It achieves performance competitive with per-scene optimization while being orders of magnitude faster, providing a scalable and practical solution for realistic driving simulation.

Computer Vision Robotics & Embodied AI Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

ReconDrive: Fast Feed-Forward 4D Gaussian Splatting for Autonomous Driving Scene Reconstruction

Related Papers