Search papers, labs, and topics across Lattice.
The paper introduces WhisperNet, a bandwidth-aware collaborative perception framework that uses a receiver-centric approach to dynamically allocate feature contributions across agents and features. WhisperNet generates lightweight saliency metadata at the sender and formulates a global request plan at the receiver to retrieve only the most informative features, followed by a collaborative feature routing module for alignment. Experiments on OPV2V demonstrate that WhisperNet achieves state-of-the-art performance, improving AP@0.7 by 2.4% with only 0.5% of the communication cost, and boosts strong baselines with only 5% bandwidth while maintaining robustness.
Achieve state-of-the-art collaborative perception with WhisperNet, a receiver-centric framework that slashes communication costs to just 0.5% while improving AP@0.7 by 2.4%.
Collaborative perception is vital for autonomous driving yet remains constrained by tight communication budgets. Earlier work reduced bandwidth by compressing full feature maps with fixed-rate encoders, which adapts poorly to a changing environment, and it further evolved into spatial selection methods that improve efficiency by focusing on salient regions, but this object-centric approach often sacrifices global context, weakening holistic scene understanding. To overcome these limitations, we introduce \textit{WhisperNet}, a bandwidth-aware framework that proposes a novel, receiver-centric paradigm for global coordination across agents. Senders generate lightweight saliency metadata, while the receiver formulates a global request plan that dynamically budgets feature contributions across agents and features, retrieving only the most informative features. A collaborative feature routing module then aligns related messages before fusion to ensure structural consistency. Extensive experiments show that WhisperNet achieves state-of-the-art performance, improving AP@0.7 on OPV2V by 2.4\% with only 0.5\% of the communication cost. As a plug-and-play component, it boosts strong baselines with merely 5\% of full bandwidth while maintaining robustness under localization noise. These results demonstrate that globally-coordinated allocation across \textit{what} and \textit{where} to share is the key to achieving efficient collaborative perception.