Search papers, labs, and topics across Lattice.
This paper introduces GraphBEV++, a multi-modal fusion framework designed to address feature misalignment in Bird's Eye View (BEV) perception for autonomous driving, particularly under sensor calibration uncertainties. The framework employs two modules, LocalAlign-v2 and GlobalAlign-v2, to correct both local and global misalignments through innovative graph matching and noise injection techniques. Experimental results indicate that GraphBEV++ not only achieves state-of-the-art performance on benchmark datasets like nuScenes and Waymo but also enhances long-range detection and occupancy estimation accuracy in various driving scenarios.
GraphBEV++ outperforms five existing baselines in addressing critical misalignment issues, significantly boosting performance in both perception and planning tasks for autonomous vehicles.
Feature misalignment in BEV perception is a critical yet often overlooked challenge in autonomous driving, especially under calibration uncertainties between LiDAR and camera sensors. To address this issue, we propose a robust multi-modal fusion framework, GraphBEV++, which systematically mitigates projection-induced misalignment. The framework consists of two key modules: LocalAlign-v2 and GlobalAlign-v2. LocalAlign-v2 introduces neighborhood-aware depth features via graph matching to correct local misalignment. It supports both LSS-based and query-based BEV representations, making it compatible with BEVFusion and BEVFormer architectures for consistent cross-paradigm alignment. GlobalAlign-v2 encompasses two variants: Deformable and Diffusion. The Deformable variant addresses global misalignment in LSS-based multi-modal BEV by explicitly learning cross-modal feature offsets. In contrast, the Diffusion variant targets implicit misalignment in query-based BEV by injecting noise to simulate misalignment and employing a denoising process to recover aligned features. Experimental results show that GraphBEV++ achieves state-of-the-art performance under misalignment noise on nuScenes and Waymo subset, improves long-range detection on Argoverse2, and generalizes effectively to the 3D occupancy prediction task, consistently improving occupancy estimation accuracy and robustness under both clean and noisy settings. Furthermore, GraphBEV++ effectively alleviates misalignment issues in end-to-end autonomous driving. Compared with five baselines (UniAD, VAD, FusionAD, MomAD, and WoTE), it demonstrates superior performance in both open-loop (nuScenes) and closed-loop (Bench2Drive and NAVSIM) evaluations across perception, prediction, and planning tasks.