Search papers, labs, and topics across Lattice.
This paper introduces a method for generating viewpoint-consistent adversarial textures on 3D objects to attack visuomotor policies. They use differentiable rendering combined with Expectation over Transformation (EOT) and a Coarse-to-Fine (C2F) curriculum to optimize textures robust to viewpoint changes. The resulting adversarial objects effectively disrupt robot manipulation tasks in simulation and real-world settings, demonstrating the vulnerability of visuomotor policies to 3D adversarial attacks.
Visuomotor policies are surprisingly susceptible to attack via optimized 3D object textures that are robust to viewpoint changes, even transferring to real-world settings.
Neural network-based visuomotor policies enable robots to perform manipulation tasks but remain susceptible to perceptual attacks. For example, conventional 2D adversarial patches are effective under fixed-camera setups, where appearance is relatively consistent; however, their efficacy often diminishes under dynamic viewpoints from moving cameras, such as wrist-mounted setups, due to perspective distortions. To proactively investigate potential vulnerabilities beyond 2D patches, this work proposes a viewpoint-consistent adversarial texture optimization method for 3D objects through differentiable rendering. As optimization strategies, we employ Expectation over Transformation (EOT) with a Coarse-to-Fine (C2F) curriculum, exploiting distance-dependent frequency characteristics to induce textures effective across varying camera-object distances. We further integrate saliency-guided perturbations to redirect policy attention and design a targeted loss that persistently drives robots toward adversarial objects. Our comprehensive experiments show that the proposed method is effective under various environmental conditions, while confirming its black-box transferability and real-world applicability.