Search papers, labs, and topics across Lattice.
The paper introduces FUSCO, a distributed agentic pipeline for detecting semantic corner cases in autonomous driving by dynamically fusing RGB camera and LiDAR data. FUSCO uses an edge-deployed large vision language model (LVLM) to generate RDF triples describing potential corner cases from RGB images, and selectively invokes LiDAR data fusion based on a confidence-aware prompt engine. Experiments on synthetic and real-world datasets demonstrate that FUSCO improves F1 score in low-visibility corner cases while reducing data transmission.
Achieve a 24% F1 gain in detecting autonomous vehicle corner cases by selectively fusing RGB and LiDAR data at the edge, without retraining, demonstrating the power of confidence-aware multimodal prompting.
Current machine learning algorithms for Autonomous vehicles (AVs) struggle with the detection of “corner cases”, infrequent situations that fall outside standard training data. To address this issue, in this paper we present FUSCO (Fusion-based Semantic Corner-case reasoning), an agentic, distributed machine learning pipeline that dynamically fuses RGB camera and LiDAR data leveraging an edge computing architecture. In the proposed framework, a pretrained large vision language model (LVLM) deployed at the edge server processes RGB images produced by the vehicle to generate Resource Description Framework (RDF) triples, machine readable subject, predicate, object assertion that describe specific corner cases. A confidence-aware prompt engine then decides whether to invoke multimodal fusion: if the adjusted confidence falls below a threshold, the edge server requests LiDAR bird’s eye view scans produced by a specialized deep neural network model, and reissues an augmented prompt combining RGB and LiDAR inputs. This selective fusion controls both data transmission and compute by invoking the communication and compute-intense LiDAR maps only when needed. FUSCO operates in zero-shot mode without any model retraining, preserving low latency in some environmental conditions, and adapting seamlessly under uncertainty. Experiments on synthetic (3CSim) and real-world (KITTI) benchmarks demonstrate up to a 24 % absolute F1 gain in low-visibility corner cases, while reducing average data transmission by 35 %, and enabling consistent, ontology-aligned vehicle-to-everything (V2X) communication.