PurdueJun 17, 2025arXiv:2506.14100

A Hierarchical Test Platform for Vision Language Model (VLM)-Integrated Real-World Autonomous Driving

Yupeng Zhou, Can Cui, Juntong Peng, Zichong Yang, Juanwu Lu, Jitesh H. Panchal, Bin Yao, Ziran Wang

AI Summary

This paper introduces a hierarchical real-world test platform designed for evaluating Vision-Language Model (VLM)-integrated autonomous driving systems, addressing the limitations of existing simulation and dataset-driven approaches in replicating real-world complexities. The platform features a lightweight middleware pipeline for VLM integration, a modular architecture for flexible component substitution, and closed-loop scenario-based testing on a controlled track. The effectiveness of the platform is demonstrated through a real-world case study, showcasing its ability to assess the performance and robustness of VLM-integrated autonomous driving under diverse conditions.

Key Contribution

Finally, a real-world testing platform that can rigorously evaluate full-stack VLM-integrated autonomous driving systems, offering configurable scenarios and closed-loop control.

Abstract

Vision-Language Models (VLMs) have demonstrated significant promise for autonomous driving due to their powerful multimodal reasoning capabilities. However, adapting VLMs from generic data to safety-critical driving contexts introduces a notable challenge known as domain shift. Existing simulation-based and dataset-driven evaluation approaches struggle to accurately replicate real-world complexities, lacking repeatable closed-loop evaluation and flexible scenario manipulation. Furthermore, current real-world testing platforms typically focus on isolated modules and do not support comprehensive interaction with VLM-based systems. Consequently, there is a critical need for a holistic testing architecture capable of integrating perception, planning, and control modules, accommodating VLM-based systems, and supporting configurable real-world testing scenarios. In this paper, we address this critical gap by proposing a hierarchical real-world test platform specialized in the rigorous evaluation of VLM-integrated autonomous driving systems. Specifically, our platform features have: a lightweight, structured, and low-latency middleware pipeline specialized for seamless VLM integration; a hierarchical modular architecture enabling flexible substitution between conventional and VLM-based autonomy components, providing exceptional deployment flexibility for rapid experimentation; and sophisticated closed-loop scenario-based testing capabilities on a controlled test track, facilitating comprehensive evaluation of the entire full-stack VLM-integrated autonomous driving pipeline, from perception, reasoning, decision-making, and planning to final vehicle maneuvers. Through an extensive real-world case study, we demonstrate the effectiveness of our platform in evaluating the performance and robustness of VLM-integrated autonomous driving under diverse realistic conditions. Project page and codes: https://github.com/YupengZhouPurdue/VLMTest

Eval Frameworks & Benchmarks Multimodal Models Robotics & Embodied AI

Citation Metrics

Citations3

Influential citations0

References53

Year2025

VenueACM Transactions on Internet of Things

Related Papers

Finding related papers...

Search

A Hierarchical Test Platform for Vision Language Model (VLM)-Integrated Real-World Autonomous Driving

Related Papers