Search papers, labs, and topics across Lattice.
The paper reveals that end-to-end autonomous driving systems often underutilize global navigation information, relying excessively on local scene understanding, which leads to poor navigation-following in complex scenarios. To address this, they introduce Sequential Navigation Guidance (SNG), a representation of global navigation information incorporating both navigation paths and turn-by-turn instructions. They also present the SNG-QA dataset and the SNG-VLA model, which fuses local and global planning, achieving state-of-the-art performance without perception-based auxiliary losses.
End-to-end driving models are surprisingly bad at using navigation, but a new framework shows how to inject it for SOTA results.
Global navigation information and local scene understanding are two crucial components of autonomous driving systems. However, our experimental results indicate that many end-to-end autonomous driving systems tend to over-rely on local scene understanding while failing to utilize global navigation information. These systems exhibit weak correlation between their planning capabilities and navigation input, and struggle to perform navigation-following in complex scenarios. To overcome this limitation, we propose the Sequential Navigation Guidance (SNG) framework, an efficient representation of global navigation information based on real-world navigation patterns. The SNG encompasses both navigation paths for constraining long-term trajectories and turn-by-turn (TBT) information for real-time decision-making logic. We constructed the SNG-QA dataset, a visual question answering (VQA) dataset based on SNG that aligns global and local planning. Additionally, we introduce an efficient model SNG-VLA that fuses local planning with global planning. The SNG-VLA achieves state-of-the-art performance through precise navigation information modeling without requiring auxiliary loss functions from perception tasks. Project page: SNG-VLA