Search papers, labs, and topics across Lattice.
ReconDrive is introduced as a feed-forward framework building upon the VGGT 3D foundation model to generate high-fidelity 4D Gaussian Splatting (4DGS) for autonomous driving scene reconstruction. It addresses limitations of existing per-scene optimization and feed-forward methods by using hybrid Gaussian prediction heads to decouple spatial and appearance regression, and a static-dynamic 4D composition strategy to model temporal motion. Experiments on nuScenes show ReconDrive achieves performance comparable to per-scene optimization but is significantly faster, improving reconstruction, novel-view synthesis, and 3D perception.
Forget slow per-scene optimization: ReconDrive uses a fast feed-forward approach to generate high-fidelity 4D Gaussian Splatting for autonomous driving, rivaling optimization-based methods in quality while being orders of magnitude faster.
High-fidelity visual reconstruction and novel-view synthesis are essential for realistic closed-loop evaluation in autonomous driving. While 4D Gaussian Splatting (4DGS) offers a promising balance of accuracy and efficiency, existing per-scene optimization methods require costly iterative refinement, rendering them unscalable for extensive urban environments. Conversely, current feed-forward approaches often suffer from degraded photometric quality. To address these limitations, we propose ReconDrive, a feed-forward framework that leverages and extends the 3D foundation model VGGT for rapid, high-fidelity 4DGS generation. Our architecture introduces two core adaptations to tailor the foundation model to dynamic driving scenes: (1) Hybrid Gaussian Prediction Heads, which decouple the regression of spatial coordinates and appearance attributes to overcome the photometric deficiencies inherent in generalized foundation features; and (2) a Static-Dynamic 4D Composition strategy that explicitly captures temporal motion via velocity modeling to represent complex dynamic environments. Benchmarked on nuScenes, ReconDrive significantly outperforms existing feed-forward baselines in reconstruction, novel-view synthesis, and 3D perception. It achieves performance competitive with per-scene optimization while being orders of magnitude faster, providing a scalable and practical solution for realistic driving simulation.