Mar 16, 2026arXiv:2603.14972

Learning from Mistakes: Post-Training for Driving VLA with Takeover Data

Yinfeng Gao, Deqing Liu, Qichao Zhang, Yupeng Zheng, Haochen Tian, Guang Li, Hangjun Ye, Long Chen, Da-Wei Ding, Dongbin Zhao

AI Summary

The paper introduces TakeVLA, a post-training framework for Vision-Language-Action (VLA) models in autonomous driving that addresses limitations of existing takeover data methods. TakeVLA incorporates pre-takeover language supervision to proactively learn from mistakes and a Scenario Dreaming reinforcement fine-tuning paradigm for active exploration in reconstructed takeover scenarios. Experiments on Bench2Drive show TakeVLA achieves state-of-the-art closed-loop performance, improving driving score by 4.93 and average TTC by 11.76% compared to SimLingo.

Key Contribution

Autonomous driving models can learn to avoid accidents *before* they happen by training on expert interventions and anticipating errors.

Abstract

Current Vision-Language-Action (VLA) paradigms in end-to-end autonomous driving rely on offline training from static datasets, leaving them vulnerable to distribution shift. Recent post-training methods use takeover data to mitigate this by augmenting the dataset with high-quality expert takeover samples, yet they suffer from two key limitations: supervision restricted to the period after the takeover moments leads to policies with limited safety margins, and passive preference optimization lacks active exploration for optimal performance. In this paper, we propose TakeVLA, a novel VLA post-training framework that overcomes these shortcomings through two complementary innovations. First, we introduce pre-takeover language supervision, which allows the VLA to learn from mistakes proactively. By explicitly teaching the model about what to do in error-prone situations, we cultivate a precautionary mindset that anticipates hazards early and substantially enlarges safety margins. Second, we propose Scenario Dreaming, a reinforcement fine-tuning paradigm that operates in reconstruceted takeover scenarios, encouraging active exploration beyond mere preference fitting. Experiments on the Bench2Drive benchmark demonstrate that TakeVLA achieves state-of-the-art closed-loop performance, surpassing the strong VLA baseline SimLingo by 4.93 in driving score, with an enhanced safety margin as evidenced by an 11.76% increase in average TTC.

Data Curation & Synthetic Data Multimodal Models Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Learning from Mistakes: Post-Training for Driving VLA with Takeover Data

Related Papers