May 6, 2026arXiv:2605.04475

Information Coordination as a Bridge: A Neuro-Symbolic Architecture for Reliable Autonomous Driving Scene Understanding

Shuo Liu, Lei Shi, Haowen Liu, Jing Xu, Yufei Gao

AI Summary

The paper introduces InfoCoordiBridge, a neuro-symbolic architecture for autonomous driving scene understanding that explicitly coordinates information between perception and language reasoning. It uses a BEV-centric approach with a unified multi-agent perception layer, an Information Coordination and Alignment (ICA) module for fusing multi-source outputs into a SceneSummary, and a SceneSummary-grounded reasoning and verification (SSRE) module. Experiments on nuScenes and Waymo datasets demonstrate that InfoCoordiBridge improves fusion consistency, reduces redundancy, and enhances factual grounding compared to existing VLM and agentic baselines.

Key Contribution

Stop feeding LLMs redundant and conflicting sensor data in autonomous driving: a new architecture slashes hallucinated entities by coordinating multi-sensor inputs *before* reasoning.

Abstract

Reliable autonomous driving requires scene understanding that is semantically consistent across heterogeneous sensors and verifiable at the reasoning stage. However, many recent LLM-driven driving systems attach the language model as a post-processor and force it to reason over redundant or conflicting perception outputs, which can amplify hallucinated entities and unsafe conclusions. This paper proposes InfoCoordiBridge, a BEV-centric neuro-symbolic architecture that inserts an explicit coordination bridge between perception and language reasoning. InfoCoordiBridge comprises (i) a unified multi-agent perception layer that outputs typed structured facts together with modality-focused synopses, (ii) an ICA module that aligns and fuses multi-source outputs into a single SceneSummary, and (iii) an SSRE module that performs SceneSummary-grounded reasoning with verification. Experiments on nuScenes and Waymo show that ICA preserves competitive 3D detection accuracy while substantially improving fusion consistency, reducing redundancy to below 1% and achieving about 98% attribute agreement. On NuScenes-QA and a template-aligned Waymo-QA benchmark, SSRE improves factual grounding and reduces hallucinated entity mentions compared with representative VLM and agentic baselines. Overall, by coordinating multi-sensor outputs into a single conflict-aware SceneSummary before prompting, InfoCoordiBridge prevents redundant and cross-modally inconsistent perception evidence from propagating into high-level reasoning.

Computer Vision Multimodal Models Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References45

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Information Coordination as a Bridge: A Neuro-Symbolic Architecture for Reliable Autonomous Driving Scene Understanding

Related Papers