ETHTU MunichJun 16, 2026arXiv:2606.18043

Uncertainty Quantification for Flow-Based Vision-Language-Action Models

Ralf Römer, Maximilian Seeliger, Saida Liu, Ben Sturgis, Marco Bagatella, Daniel Marta, Andreas Krause, Angela P. Schoellig

AI Summary

This paper introduces a method for quantifying epistemic uncertainty in vision-language-action models (VLAs) using velocity-field disagreement (VFD) across a small ensemble, addressing a critical gap in their deployment reliability. By implementing this uncertainty estimation, the authors develop the SAVE framework, which facilitates uncertainty-guided active multitask fine-tuning, significantly reducing the need for expert demonstrations in adapting VLAs to new tasks. Experimental results on the LIBERO benchmark show that VFD not only provides better-calibrated uncertainty estimates but also enhances failure detection and reduces data acquisition requirements by at least 22% compared to traditional methods.

Key Contribution

Uncertainty quantification in VLAs can reduce the need for costly expert demonstrations by over 22%, enhancing their adaptability and reliability in real-world applications.

Abstract

Vision-language-action models (VLAs) combine vision-language backbones with expressive generative action heads trained via flow matching on large-scale robotic datasets. Despite their strong empirical performance in robotic manipulation, VLAs lack mechanisms to quantify confidence in their predictions and to detect when their actions may be unreliable. This presents a critical limitation for real-world deployment in non-stationary environments, where models inevitably encounter scenarios outside their pretraining distribution and may fail without warning. To address this, we derive an efficient method for quantifying epistemic uncertainty in flow-matching models by leveraging velocity-field disagreement (VFD) across a small ensemble. We successfully use this uncertainty estimate for failure detection during deployment and active fine-tuning of flow-based VLAs. To this end, we propose SAVE, a framework for uncertainty-guided active multitask fine-tuning that reduces the number of costly expert demonstrations required to adapt VLAs to new tasks. Through extensive experiments on the LIBERO benchmark, we demonstrate that VFD yields better-calibrated uncertainty estimates predictive of downstream performance, that VFD achieves strong performance in detecting failures, and that uncertainty-guided data acquisition with SAVE requires at least 22% fewer samples than baselines. In summary, our work shows that quantifying epistemic uncertainty in flow-based VLAs improves both failure awareness and adaptation. Project website: tum-lsy.github.io/uq_vla/.

Multimodal Models Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Uncertainty Quantification for Flow-Based Vision-Language-Action Models

Related Papers