Search papers, labs, and topics across Lattice.
The paper introduces InfoCoordiBridge, a neuro-symbolic architecture for autonomous driving scene understanding that explicitly coordinates information between perception and language reasoning. It uses a BEV-centric approach with a unified multi-agent perception layer, an Information Coordination and Alignment (ICA) module for fusing multi-source outputs into a SceneSummary, and a SceneSummary-grounded reasoning and verification (SSRE) module. Experiments on nuScenes and Waymo datasets demonstrate that InfoCoordiBridge improves fusion consistency, reduces redundancy, and enhances factual grounding compared to existing VLM and agentic baselines.
Stop feeding LLMs redundant and conflicting sensor data in autonomous driving: a new architecture slashes hallucinated entities by coordinating multi-sensor inputs *before* reasoning.
Reliable autonomous driving requires scene understanding that is semantically consistent across heterogeneous sensors and verifiable at the reasoning stage. However, many recent LLM-driven driving systems attach the language model as a post-processor and force it to reason over redundant or conflicting perception outputs, which can amplify hallucinated entities and unsafe conclusions. This paper proposes InfoCoordiBridge, a BEV-centric neuro-symbolic architecture that inserts an explicit coordination bridge between perception and language reasoning. InfoCoordiBridge comprises (i) a unified multi-agent perception layer that outputs typed structured facts together with modality-focused synopses, (ii) an ICA module that aligns and fuses multi-source outputs into a single SceneSummary, and (iii) an SSRE module that performs SceneSummary-grounded reasoning with verification. Experiments on nuScenes and Waymo show that ICA preserves competitive 3D detection accuracy while substantially improving fusion consistency, reducing redundancy to below 1% and achieving about 98% attribute agreement. On NuScenes-QA and a template-aligned Waymo-QA benchmark, SSRE improves factual grounding and reduces hallucinated entity mentions compared with representative VLM and agentic baselines. Overall, by coordinating multi-sensor outputs into a single conflict-aware SceneSummary before prompting, InfoCoordiBridge prevents redundant and cross-modally inconsistent perception evidence from propagating into high-level reasoning.