Search papers, labs, and topics across Lattice.
The paper introduces SEF-MAP, a subspace-expert fusion framework, to address inconsistencies in multi-modal HD map prediction by disentangling BEV features into LiDAR-private, Image-private, Shared, and Interaction subspaces, each handled by a dedicated expert. An uncertainty-aware gating mechanism adaptively combines expert outputs based on predictive variance, while a usage balance regularizer prevents expert collapse. Distribution-aware masking, simulating modality-drop scenarios with EMA-statistical surrogate features and a specialization loss, further enhances robustness, leading to state-of-the-art performance on nuScenes and Argoverse2 datasets.
By explicitly disentangling BEV features into semantic subspaces and assigning them to specialized experts, SEF-MAP achieves state-of-the-art HD map prediction, even when sensor data is degraded.
High-definition (HD) maps are essential for autonomous driving, yet multi-modal fusion often suffers from inconsistency between camera and LiDAR modalities, leading to performance degradation under low-light conditions, occlusions, or sparse point clouds. To address this, we propose SEFMAP, a Subspace-Expert Fusion framework for robust multimodal HD map prediction. The key idea is to explicitly disentangle BEV features into four semantic subspaces: LiDAR-private, Image-private, Shared, and Interaction. Each subspace is assigned a dedicated expert, thereby preserving modality-specific cues while capturing cross-modal consensus. To adaptively combine expert outputs, we introduce an uncertainty-aware gating mechanism at the BEV-cell level, where unreliable experts are down-weighted based on predictive variance, complemented by a usage balance regularizer to prevent expert collapse. To enhance robustness in degraded conditions and promote role specialization, we further propose distribution-aware masking: during training, modality-drop scenarios are simulated using EMA-statistical surrogate features, and a specialization loss enforces distinct behaviors of private, shared, and interaction experts across complete and masked inputs. Experiments on nuScenes and Argoverse2 benchmarks demonstrate that SEFMAP achieves state-of-the-art performance, surpassing prior methods by +4.2% and +4.8% in mAP, respectively. SEF-MAPprovides a robust and effective solution for multi-modal HD map prediction under diverse and degraded conditions.