Apr 23, 2026arXiv:2604.21841

Cross-Modal Phantom: Coordinated Camera-LiDAR Spoofing Against Multi-Sensor Fusion in Autonomous Vehicles

AI Summary

This paper investigates a vulnerability in multi-sensor fusion (MSF) systems used in autonomous vehicles, where coordinated attacks can bypass redundancy by creating false cross-sensor consistency. The authors simulate a coordinated attack by injecting perspective-aware image patches and synthetic LiDAR point clusters into the sensor data, mimicking the effects of synchronized physical spoofing sources. Experiments on 400 KITTI scenes demonstrate that this coordinated spoofing achieves an 85.5% success rate in deceiving a state-of-the-art perception model, highlighting a critical flaw in MSF-based perception.

Key Contribution

Autonomous vehicles can be fooled by coordinated camera and LiDAR attacks that create "phantom" objects, even when using multi-sensor fusion designed for redundancy.

Abstract

Autonomous Vehicles (AVs) increasingly depend on Multi-Sensor Fusion (MSF) to combine complementary modalities such as cameras and LiDAR for robust perception. While this redundancy is intended to safeguard against single-sensor failures, the fusion process itself introduces a subtle and underexplored vulnerability. In this work, we investigate whether an attacker can bypass MSF's redundancy by fabricating cross-sensor consistency, making multiple sensors agree on the same false object. We design a coordinated, data-level (early-fusion) attack that emulates the outcome of two synchronized physical spoofing sources: an infrared (IR) projection that induces a false camera detection and a LiDAR signal injection that produces a matching 3D point cluster. Rather than implementing the physical attack hardware, we simulate its sensor-level outcomes by inserting perspective-aware image patches and synthetic LiDAR point clusters aligned in 3D space. This approach preserves the perceptual effects that real IR and IEMI-based spoofing would create at the sensor output. Using 400 KITTI scenes, our large-scale evaluation shows that the coordinated spoofing deceives a state-of-the-art perception model with an 85.5% successful attack rate. These findings provide the first quantitative evidence that malicious cross-modal consistency can compromise MSF-based perception, revealing a critical vulnerability in the core data-fusion logic of modern autonomous vehicle systems.

Computer Vision Red-Teaming & Adversarial Robustness Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References31

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Cross-Modal Phantom: Coordinated Camera-LiDAR Spoofing Against Multi-Sensor Fusion in Autonomous Vehicles

Related Papers