D visual featuresFeb 25, 2026arXiv:2602.21589

SEF-MAP: Subspace-Decomposed Expert Fusion for Robust Multimodal HD Map Prediction

Haoxiang Fu, Lingfeng Zhang, Ruibing Hu, Zhengrong Li, Guanjing Liu, Zimu Tan, Long Chen, Hangjun Ye, Xiaoshuai Hao

AI Summary

The paper introduces SEF-MAP, a subspace-expert fusion framework, to address inconsistencies in multi-modal HD map prediction by disentangling BEV features into LiDAR-private, Image-private, Shared, and Interaction subspaces, each handled by a dedicated expert. An uncertainty-aware gating mechanism adaptively combines expert outputs based on predictive variance, while a usage balance regularizer prevents expert collapse. Distribution-aware masking, simulating modality-drop scenarios with EMA-statistical surrogate features and a specialization loss, further enhances robustness, leading to state-of-the-art performance on nuScenes and Argoverse2 datasets.

Key Contribution

By explicitly disentangling BEV features into semantic subspaces and assigning them to specialized experts, SEF-MAP achieves state-of-the-art HD map prediction, even when sensor data is degraded.

Abstract

High-definition (HD) maps are essential for autonomous driving, yet multi-modal fusion often suffers from inconsistency between camera and LiDAR modalities, leading to performance degradation under low-light conditions, occlusions, or sparse point clouds. To address this, we propose SEFMAP, a Subspace-Expert Fusion framework for robust multimodal HD map prediction. The key idea is to explicitly disentangle BEV features into four semantic subspaces: LiDAR-private, Image-private, Shared, and Interaction. Each subspace is assigned a dedicated expert, thereby preserving modality-specific cues while capturing cross-modal consensus. To adaptively combine expert outputs, we introduce an uncertainty-aware gating mechanism at the BEV-cell level, where unreliable experts are down-weighted based on predictive variance, complemented by a usage balance regularizer to prevent expert collapse. To enhance robustness in degraded conditions and promote role specialization, we further propose distribution-aware masking: during training, modality-drop scenarios are simulated using EMA-statistical surrogate features, and a specialization loss enforces distinct behaviors of private, shared, and interaction experts across complete and masked inputs. Experiments on nuScenes and Argoverse2 benchmarks demonstrate that SEFMAP achieves state-of-the-art performance, surpassing prior methods by +4.2% and +4.8% in mAP, respectively. SEF-MAPprovides a robust and effective solution for multi-modal HD map prediction under diverse and degraded conditions.

Computer Vision Multimodal Models Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SEF-MAP: Subspace-Decomposed Expert Fusion for Robust Multimodal HD Map Prediction

Related Papers