Search papers, labs, and topics across Lattice.
The paper introduces RESBev, a plug-and-play module designed to enhance the robustness of BEV perception models against sensor degradation and adversarial attacks. RESBev reframes robustness as a latent semantic prediction problem, using a latent world model to learn spatiotemporal correlations in BEV features and predict clean features to reconstruct corrupted observations. Experiments on nuScenes show that RESBev significantly improves the robustness of existing BEV perception models with few-shot fine-tuning.
A plug-and-play module, RESBev, fortifies BEV perception against sensor degradation and adversarial attacks by learning latent BEV state transitions, offering a practical route to more reliable autonomous driving systems.
Bird's-eye-view (BEV) perception has emerged as a cornerstone of autonomous driving systems, providing a structured, ego-centric representation critical for downstream planning and control. However, real-world deployment faces challenges from sensor degradation and adversarial attacks, which can cause severe perceptual anomalies and ultimately compromise the safety of autonomous driving systems. To address this, we propose a resilient and plug-and-play BEV perception method, RESBev, which can be easily applied to existing BEV perception methods to enhance their robustness to diverse disturbances. Specifically, we reframe perception robustness as a latent semantic prediction problem. A latent world model is constructed to extract spatiotemporal correlations across sequential BEV observations, thereby learning the underlying BEV state transitions to predict clean BEV features for reconstructing corrupted observations. The proposed framework operates at the semantic feature level of the Lift-Splat-Shoot pipeline, enabling recovery that generalizes across both natural disturbances and adversarial attacks without modifying the underlying backbone. Extensive experiments on the nuScenes dataset demonstrate that, with few-shot fine-tuning, RESBev significantly improves the robustness of existing BEV perception models against various external disturbances and adversarial attacks.