Search papers, labs, and topics across Lattice.
2
0
3
VLMs can be easily fooled in the real world by strategically manipulating lighting, causing them to misinterpret scenes and hallucinate nonsensical captions.
VLMs can be devastatingly fooled by modifying less than 2% of image pixels in a fixed, X-shaped pattern, causing them to fail spectacularly across diverse tasks like classification, captioning, and question answering.