Search papers, labs, and topics across Lattice.
This paper introduces VALD, a training-free defense against adversarial attacks on LVLMs that leverages image transformations and agentic data consolidation. VALD employs a two-stage detection mechanism, initially filtering clean inputs using image consistency checks under transformations and subsequently examining text-embedding discrepancies. By consolidating multiple LLM responses and adaptively invoking a powerful LLM only when necessary, VALD achieves state-of-the-art accuracy with minimal overhead, effectively defending against adversarial examples.
A surprisingly effective defense against LVLM image attacks can be built without any training, using only image transformations and strategic LLM prompting.
Large Vision-Language Models (LVLMs) can be vulnerable to adversarial images that subtly bias their outputs toward plausible yet incorrect responses. We introduce a general, efficient, and training-free defense that combines image transformations with agentic data consolidation to recover correct model behavior. A key component of our approach is a two-stage detection mechanism that quickly filters out the majority of clean inputs. We first assess image consistency under content-preserving transformations at negligible computational cost. For more challenging cases, we examine discrepancies in a text-embedding space. Only when necessary do we invoke a powerful LLM to resolve attack-induced divergences. A key idea is to consolidate multiple responses, leveraging both their similarities and their differences. We show that our method achieves state-of-the-art accuracy while maintaining notable efficiency: most clean images skip costly processing, and even in the presence of numerous adversarial examples, the overhead remains minimal.