Search papers, labs, and topics across Lattice.
2
0
3
Even with robust training techniques like EOT, a carefully crafted adversarial patch can reliably fool VIS-IR VLMs and transfer across tasks like classification, captioning, and VQA.
VLMs can be devastatingly fooled by modifying less than 2% of image pixels in a fixed, X-shaped pattern, causing them to fail spectacularly across diverse tasks like classification, captioning, and question answering.