NVIDIAFeb 12, 2026arXiv:2602.11860

Talk2DM: Enabling Natural Language Querying and Commonsense Reasoning for Vehicle-Road-Cloud Integrated Dynamic Maps with Large Language Models

Lu Tao, Jinxuan Luo, Shen Ying, Pan Zhang, Hiroaki Takada

AI Summary

This paper introduces Talk2DM, a plug-and-play module designed to enhance vehicle-road-cloud dynamic map (VRC-DM) systems with natural language querying and commonsense reasoning capabilities. To facilitate this, the authors created VRCsim, a VRC cooperative perception simulation framework, and VRC-QA, a question-answering dataset focused on spatial reasoning in mixed-traffic scenarios. Talk2DM leverages a novel chain-of-prompt (CoP) mechanism to integrate human-defined rules with LLM knowledge, achieving high accuracy and reasonable response times with models like Qwen3:8B, Gemma3:27B, and GPT-oss.

Key Contribution

LLMs can now understand your spoken questions about complex traffic scenarios and reason about dynamic maps, opening up more intuitive human-machine interaction for autonomous driving.

Abstract

Dynamic maps (DM) serve as the fundamental information infrastructure for vehicle-road-cloud (VRC) cooperative autonomous driving in China and Japan. By providing comprehensive traffic scene representations, DM overcome the limitations of standalone autonomous driving systems (ADS), such as physical occlusions. Although DM-enhanced ADS have been successfully deployed in real-world applications in Japan, existing DM systems still lack a natural-language-supported (NLS) human interface, which could substantially enhance human-DM interaction. To address this gap, this paper introduces VRCsim, a VRC cooperative perception (CP) simulation framework designed to generate streaming VRC-CP data. Based on VRCsim, we construct a question-answering data set, VRC-QA, focused on spatial querying and reasoning in mixed-traffic scenes. Building upon VRCsim and VRC-QA, we further propose Talk2DM, a plug-and-play module that extends VRC-DM systems with NLS querying and commonsense reasoning capabilities. Talk2DM is built upon a novel chain-of-prompt (CoP) mechanism that progressively integrates human-defined rules with the commonsense knowledge of large language models (LLMs). Experiments on VRC-QA show that Talk2DM can seamlessly switch across different LLMs while maintaining high NLS query accuracy, demonstrating strong generalization capability. Although larger models tend to achieve higher accuracy, they incur significant efficiency degradation. Our results reveal that Talk2DM, powered by Qwen3:8B, Gemma3:27B, and GPT-oss models, achieves over 93\% NLS query accuracy with an average response time of only 2-5 seconds, indicating strong practical potential.

Natural Language Processing Reasoning & Chain-of-Thought Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References43

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Talk2DM: Enabling Natural Language Querying and Commonsense Reasoning for Vehicle-Road-Cloud Integrated Dynamic Maps with Large Language Models

Related Papers