Tsinghua AIHKUSTSYSUApr 7, 2026arXiv:2604.05595

Uncovering Linguistic Fragility in Vision-Language-Action Models via Diversity-Aware Red Teaming

Baoshun Tong, Haoran He, Ling Pan, Yang Liu, Liang Lin

AI Summary

The paper introduces Diversity-Aware Embodied Red Teaming (DAERT), a novel framework for uncovering vulnerabilities in Vision-Language-Action (VLA) models by generating diverse and effective adversarial instructions. DAERT employs a uniform policy to explore a wider range of challenging instructions, mitigating the mode collapse issue common in RL-based red teaming. Experiments on robotic benchmarks demonstrate that DAERT significantly reduces the task success rate of state-of-the-art VLAs like π0 and OpenVLA, highlighting critical safety blind spots.

Key Contribution

VLA models, seemingly robust, crumble when faced with diverse linguistic variations, as a new red-teaming approach reveals a staggering drop in task success from 93% to just 6%.

Abstract

Vision-Language-Action (VLA) models have achieved remarkable success in robotic manipulation. However, their robustness to linguistic nuances remains a critical, under-explored safety concern, posing a significant safety risk to real-world deployment. Red teaming, or identifying environmental scenarios that elicit catastrophic behaviors, is an important step in ensuring the safe deployment of embodied AI agents. Reinforcement learning (RL) has emerged as a promising approach in automated red teaming that aims to uncover these vulnerabilities. However, standard RL-based adversaries often suffer from severe mode collapse due to their reward-maximizing nature, which tends to converge to a narrow set of trivial or repetitive failure patterns, failing to reveal the comprehensive landscape of meaningful risks. To bridge this gap, we propose a novel \textbf{D}iversity-\textbf{A}ware \textbf{E}mbodied \textbf{R}ed \textbf{T}eaming (\textbf{DAERT}) framework, to expose the vulnerabilities of VLAs against linguistic variations. Our design is based on evaluating a uniform policy, which is able to generate a diverse set of challenging instructions while ensuring its attack effectiveness, measured by execution failures in a physical simulator. We conduct extensive experiments across different robotic benchmarks against two state-of-the-art VLAs, including $\pi_0$ and OpenVLA. Our method consistently discovers a wider range of more effective adversarial instructions that reduce the average task success rate from 93.33\% to 5.85\%, demonstrating a scalable approach to stress-testing VLA agents and exposing critical safety blind spots before real-world deployment.

Multimodal Models Red-Teaming & Adversarial Robustness Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References38

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Uncovering Linguistic Fragility in Vision-Language-Action Models via Diversity-Aware Red Teaming

Related Papers