Search papers, labs, and topics across Lattice.
This paper introduces Test-time Adversarial Takeover (TAKO), a novel attack method that allows adversaries to manipulate robotic diffusion policies in real-time by exploiting visual conditioning vulnerabilities. Unlike traditional adversarial attacks that disrupt performance, TAKO enables attackers to gain complete control over a robot's actions by applying learned universal patches to the visual input stream. The results demonstrate that human operators can achieve 100% takeover success across various tasks and models, highlighting a critical security risk in embodied AI systems.
Adversaries can hijack robotic policies in real-time, achieving complete control over their actions with a 100% success rate across multiple tasks.
Diffusion-based action generation has become a foundational component of embodied AI, but its reliance on visual conditioning leaves deployed visuomotor policies vulnerable to adversarial manipulation. Most prior attacks focus on disruption: they perturb the observation stream to reduce task success or induce erratic behavior. We study a stronger threat, Test-time Adversarial Takeover (TAKO), in which an attacker obtains a real-time steering interface over a frozen robot policy and turns it into a remotely piloted instrument. TAKO learns a small vocabulary of reusable universal patches through differentiable diffusion inference; at test time, the attacker switches among these patches in the camera stream to compose attacker-chosen trajectories. This works because the perturbation acts on the visual conditioning pathway, where the induced bias can persist through iterative generative inference. We further show that the natural targeted baseline, target-policy matching, fails because the victim policy cannot reliably supervise itself on out-of-distribution target shifts. Across four tasks (2D manipulation, simulated aerial delivery, simulated ground navigation, and physical-world ground navigation), two visual encoders (ResNet-18 and EfficientNet-B0 + Transformer), and three generative inference families (DDPM, DDIM, and flow matching), human operators achieve 100\% takeover success on attacker-defined objectives in every evaluated setting. The project page is available at https://tako-attack.github.io.