BeihangCASShandong Hi-speed Group Co.Mar 17, 2026arXiv:2603.16495

ExpressMind: A Multimodal Pretrained Large Language Model for Expressway Operation

Zihe Wang, Yihuan Wang, Haiyang Yu. Zhiyong Cui, Xiaojian Liao, Chengcheng Wang, Yonglin Tian, Yongxin Tong

AI Summary

The paper introduces ExpressMind, a multimodal LLM pretrained on a new full-stack expressway dataset, designed to serve as a cognitive core for intelligent expressway operations. ExpressMind employs a dual-layer LLM pre-training paradigm with self-supervised and unsupervised learning, coupled with a Graph-Augmented RAG framework for dynamic knowledge indexing and a RL-aligned Chain-of-Thought (RL-CoT) mechanism for incident response reasoning. Experiments on a new multimodal expressway benchmark demonstrate ExpressMind's superior performance in event detection, safety response generation, and complex traffic analysis compared to existing baselines.

Key Contribution

General LLMs can't handle the nuances of expressway operations, so this paper built ExpressMind, a specialized multimodal LLM that outperforms existing models in event detection, safety response, and traffic analysis.

Abstract

The current expressway operation relies on rule-based and isolated models, which limits the ability to jointly analyze knowledge across different systems. Meanwhile, Large Language Models (LLMs) are increasingly applied in intelligent transportation, advancing traffic models from algorithmic to cognitive intelligence. However, general LLMs are unable to effectively understand the regulations and causal relationships of events in unconventional scenarios in the expressway field. Therefore, this paper constructs a pre-trained multimodal large language model (MLLM) for expressways, ExpressMind, which serves as the cognitive core for intelligent expressway operations. This paper constructs the industry's first full-stack expressway dataset, encompassing traffic knowledge texts, emergency reasoning chains, and annotated video events to overcome data scarcity. This paper proposes a dual-layer LLM pre-training paradigm based on self-supervised training and unsupervised learning. Additionally, this study introduces a Graph-Augmented RAG framework to dynamically index the expressway knowledge base. To enhance reasoning for expressway incident response strategies, we develop a RL-aligned Chain-of-Thought (RL-CoT) mechanism that enforces consistency between model reasoning and expert problem-solving heuristics for incident handling. Finally, ExpressMind integrates a cross-modal encoder to align the dynamic feature sequences under the visual and textual channels, enabling it to understand traffic scenes in both video and image modalities. Extensive experiments on our newly released multi-modal expressway benchmark demonstrate that ExpressMind comprehensively outperforms existing baselines in event detection, safety response generation, and complex traffic analysis. The code and data are available at: https://wanderhee.github.io/ExpressMind/.

Multimodal Models Natural Language Processing Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

ExpressMind: A Multimodal Pretrained Large Language Model for Expressway Operation

Related Papers