KITMar 10, 2026arXiv:2603.09737

$M^2$-Occ: Resilient 3D Semantic Occupancy Prediction for Autonomous Driving with Incomplete Camera Inputs

Kaixin Lin, Kunyu Peng, Di Wen, Yufan Chen, Ruiping Liu

AI Summary

The paper introduces $M^2$-Occ, a framework for resilient 3D semantic occupancy prediction in autonomous driving scenarios with incomplete camera inputs. It uses a Multi-view Masked Reconstruction (MMR) module to recover missing-view representations and a Feature Memory Module (FMM) to refine ambiguous voxel features using class-level semantic prototypes. Experiments on nuScenes-based SurroundOcc demonstrate that $M^2$-Occ significantly improves IoU under various missing-view scenarios without compromising full-view performance, achieving up to 5.01% improvement with five missing views.

Key Contribution

Autonomous vehicles can now better "see" the world even when cameras fail, thanks to a new method that fills in the blanks by leveraging spatial overlaps and learned semantic priors.

Abstract

Semantic occupancy prediction enables dense 3D geometric and semantic understanding for autonomous driving. However, existing camera-based approaches implicitly assume complete surround-view observations, an assumption that rarely holds in real-world deployment due to occlusion, hardware malfunction, or communication failures. We study semantic occupancy prediction under incomplete multi-camera inputs and introduce $M^2$-Occ, a framework designed to preserve geometric structure and semantic coherence when views are missing. $M^2$-Occ addresses two complementary challenges. First, a Multi-view Masked Reconstruction (MMR) module leverages the spatial overlap among neighboring cameras to recover missing-view representations directly in the feature space. Second, a Feature Memory Module (FMM) introduces a learnable memory bank that stores class-level semantic prototypes. By retrieving and integrating these global priors, the FMM refines ambiguous voxel features, ensuring semantic consistency even when observational evidence is incomplete. We introduce a systematic missing-view evaluation protocol on the nuScenes-based SurroundOcc benchmark, encompassing both deterministic single-view failures and stochastic multi-view dropout scenarios. Under the safety-critical missing back-view setting, $M^2$-Occ improves the IoU by 4.93%. As the number of missing cameras increases, the robustness gap further widens; for instance, under the setting with five missing views, our method boosts the IoU by 5.01%. These gains are achieved without compromising full-view performance. The source code will be publicly released at https://github.com/qixi7up/M2-Occ.

Computer Vision Multimodal Models Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

$M^2$-Occ: Resilient 3D Semantic Occupancy Prediction for Autonomous Driving with Incomplete Camera Inputs

Related Papers