RutgersMar 18, 2026arXiv:2603.18342

Shifting Uncertainty to Critical Moments: Towards Reliable Uncertainty Quantification for VLA Model

Yanchuan Tang, Taowen Wang, Yue Chen, Yuefei Chen, Qiang Guan, Qiang Guan, Ruixiang Tang, Ruixiang Tang

AI Summary

This paper addresses the problem of unreliable uncertainty quantification in Vision-Language-Action (VLA) models for robotics, where mean aggregation dilutes critical, short-lived uncertainty spikes. They propose a new approach that uses max-based sliding window pooling, motion-aware stability weighting, and DoF-adaptive calibration to better capture transient risk signals and unstable behaviors. Experiments on the LIBERO benchmark demonstrate improved failure prediction accuracy, enabling more reliable failure detection for human-in-the-loop interventions.

Key Contribution

Don't let your robot's brief moment of panic get lost in the noise – this new uncertainty method spotlights those critical spikes to predict failures before they happen.

Abstract

Vision-Language-Action (VLA) models enable general-purpose robotic policies by mapping visual observations and language instructions to low-level actions, but they often lack reliable introspection. A common practice is to compute a token-level uncertainty signal and take its mean over a rollout. However, mean aggregation can dilute short-lived but safety-critical uncertainty spikes in continuous control. In particular, successful rollouts may contain localized high-entropy segments due to benign noise or non-critical micro-adjustments, while failure rollouts can appear low-entropy for most timesteps and only exhibit brief spikes near the onset of failure. We propose a unified uncertainty quantification approach for predicting rollout success versus failure that (1) uses max-based sliding window pooling to preserve transient risk signals, (2) applies motion-aware stability weighting to emphasize high-frequency action oscillations associated with unstable behaviors, and (3) performs DoF-adaptive calibration via Bayesian Optimization to prioritize kinematically critical axes. Experiments on the LIBERO benchmark show that our method substantially improves failure prediction accuracy and yields more reliable signals for failure detection, which can support downstream human-in-the-loop interventions.

Eval Frameworks & Benchmarks Multimodal Models Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References23

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Shifting Uncertainty to Critical Moments: Towards Reliable Uncertainty Quantification for VLA Model

Related Papers