CCNY Robotics LabChang'an UniversityChang’an UniversityUC DavisMay 4, 2025arXiv:2505.02123

DriveAgent: Multi-Agent Structured Reasoning With LLM and Multimodal Sensor Fusion for Autonomous Driving

Xinmeng Hou, Wuqi Wang, Long Yang, Hao Lin, Jinglun Feng, Haigen Min, Xiangmo Zhao

AI Summary

The paper introduces DriveAgent, a modular multi-agent autonomous driving framework that uses an LLM to orchestrate specialized agents processing multimodal sensor data (camera, LiDAR, IMU, GPS) for perception, reasoning, and action planning. DriveAgent employs a pipeline of agents for descriptive analysis, vehicle-level assessment, environmental reasoning, and urgency-aware decision generation. Experiments demonstrate that DriveAgent achieves a 26.31% improvement in vehicle reasoning and up to 2.85% improvement in environmental reasoning compared to baselines, showcasing the benefits of LLM-driven multi-agent sensor fusion.

Key Contribution

LLMs can drive autonomous vehicles more effectively by orchestrating specialized sensor-processing agents, leading to significant gains in both vehicle and environmental reasoning.

Abstract

We introduce DriveAgent, a modular multi-agent autonomous driving framework that leverages large language model (LLM) reasoning combined with multimodal sensor fusion for autonomous driving. DriveAgent orchestrates specialized agents operating on camera, Light Detection and Ranging (LiDAR), Inertial Measurement Unit (IMU), and Global Positioning System (GPS) with LLM-driven analytical processes to deliver temporally aligned perception, causal reasoning, and action recommendations. The framework operates through a modular agent-based pipeline comprising four principal modules: (i) a descriptive analysis agent identifying critical sensor data events based on filtered timestamps, (ii) dedicated vehicle-level analysis conducted by LiDAR and vision agents that collaboratively assess vehicle conditions and movements, (iii) environmental reasoning and causal analysis agents explaining contextual changes and their underlying mechanisms, and (iv) an urgency-aware decision-generation agent prioritizing insights and proposing timely maneuvers. This modular design empowers the LLM to effectively coordinate specialized perception and reasoning agents, delivering cohesive, interpretable insights into complex autonomous driving scenarios. Extensive experiments demonstrate that DriveAgent substantially outperforms baseline methods, achieving a 26.31% improvement in vehicle reasoning and consistent enhancements of up to 2.85% in environmental reasoning. These results highlight the effectiveness of our LLM-driven multi-agent sensor fusion framework in boosting the robustness and reliability of autonomous driving systems.

Multimodal Models Reasoning & Chain-of-Thought Tool Use & Agents

Citation Metrics

Citations11

Influential citations1

References36

Year2025

VenueIEEE Robotics and Automation Letters

Related Papers

Finding related papers...

Search

DriveAgent: Multi-Agent Structured Reasoning With LLM and Multimodal Sensor Fusion for Autonomous Driving

Related Papers