Search papers, labs, and topics across Lattice.
2
0
3
VLMs can be devastatingly fooled by modifying less than 2% of image pixels in a fixed, X-shaped pattern, causing them to fail spectacularly across diverse tasks like classification, captioning, and question answering.
Medical vision-language models are surprisingly brittle: clinically plausible image manipulations, like those introduced during routine acquisition and delivery, can drastically degrade their performance.