BAIRApplied IntuitionTexas A&MFeb 24, 2026arXiv:2602.21172

NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning

I. Rawal, Ishaan Rawal, Shubh Gupta, Yihan Hu, Wei Zhan, Wei Zhan

AI Summary

The paper introduces NoRD, a Vision-Language-Action model for autonomous driving designed to reduce data and annotation requirements. NoRD achieves competitive performance on Waymo and NAVSIM using less than 60% of the data and without requiring reasoning annotations, resulting in a 3x reduction in tokens compared to existing VLAs. The key to NoRD's efficiency is the incorporation of Dr. GRPO, which mitigates difficulty bias in Group Relative Policy Optimization when training on small, reasoning-free datasets.

Key Contribution

You can now train autonomous driving VLAs on 60% less data and without any reasoning annotations, thanks to a fix for difficulty bias in Group Relative Policy Optimization.

Abstract

Vision-Language-Action (VLA) models are advancing autonomous driving by replacing modular pipelines with unified end-to-end architectures. However, current VLAs face two expensive requirements: (1) massive dataset collection, and (2) dense reasoning annotations. In this work, we address both challenges with \modelname (\textbf{No} \textbf{R}easoning for \textbf{D}riving). Compared to existing VLAs, \modelname achieves competitive performance while being fine-tuned on $<$60\% of the data and no reasoning annotations, resulting in 3$\times$ fewer tokens. We identify that standard Group Relative Policy Optimization (GRPO) fails to yield significant improvements when applied to policies trained on such small, reasoning-free datasets. We show that this limitation stems from difficulty bias, which disproportionately penalizes reward signals from scenarios that produce high-variance rollouts within GRPO. \modelname overcomes this by incorporating Dr.~GRPO, a recent algorithm designed to mitigate difficulty bias in LLMs. As a result, \modelname achieves competitive performance on Waymo and NAVSIM with a fraction of the training data and no reasoning overhead, enabling more efficient autonomous systems.

Multimodal Models Robotics & Embodied AI Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References54

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning

Related Papers