D observations intoApr 14, 2026arXiv:2604.12208

Unveiling the Surprising Efficacy of Navigation Understanding in End-to-End Autonomous Driving

Zhihua Hua, Junli Wang, Pengfei Li, Qihao Jin, Bo Zhang, Kehua Sheng, Yilun Chen, Zhongxue Gan, Wenchao Ding

AI Summary

The paper reveals that end-to-end autonomous driving systems often underutilize global navigation information, relying excessively on local scene understanding, which leads to poor navigation-following in complex scenarios. To address this, they introduce Sequential Navigation Guidance (SNG), a representation of global navigation information incorporating both navigation paths and turn-by-turn instructions. They also present the SNG-QA dataset and the SNG-VLA model, which fuses local and global planning, achieving state-of-the-art performance without perception-based auxiliary losses.

Key Contribution

End-to-end driving models are surprisingly bad at using navigation, but a new framework shows how to inject it for SOTA results.

Abstract

Global navigation information and local scene understanding are two crucial components of autonomous driving systems. However, our experimental results indicate that many end-to-end autonomous driving systems tend to over-rely on local scene understanding while failing to utilize global navigation information. These systems exhibit weak correlation between their planning capabilities and navigation input, and struggle to perform navigation-following in complex scenarios. To overcome this limitation, we propose the Sequential Navigation Guidance (SNG) framework, an efficient representation of global navigation information based on real-world navigation patterns. The SNG encompasses both navigation paths for constraining long-term trajectories and turn-by-turn (TBT) information for real-time decision-making logic. We constructed the SNG-QA dataset, a visual question answering (VQA) dataset based on SNG that aligns global and local planning. Additionally, we introduce an efficient model SNG-VLA that fuses local planning with global planning. The SNG-VLA achieves state-of-the-art performance through precise navigation information modeling without requiring auxiliary loss functions from perception tasks. Project page: SNG-VLA

Computer Vision Robotics & Embodied AI World Models & Planning

Citation Metrics

Citations0

Influential citations0

References41

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Unveiling the Surprising Efficacy of Navigation Understanding in End-to-End Autonomous Driving

Related Papers