Search papers, labs, and topics across Lattice.
This paper identifies that hallucination in VLMs during object-hiding attacks arises from semantic discontinuity, not just object absence. They propose a background-consistent object concealment attack that re-encodes object representations to match surrounding background regions, preserving token structure and attention flow. Experiments demonstrate their method reduces grounded hallucination by up to 3x compared to attention-suppression-based attacks while preserving a high percentage of non-target objects.
Object-hiding attacks on VLMs don't need to trigger hallucinations: by re-encoding objects to match their background, you can conceal them more effectively.
Vision-language models (VLMs) have recently shown remarkable capabilities in visual understanding and generation, but remain vulnerable to adversarial manipulations of visual content. Prior object-hiding attacks primarily rely on suppressing or blocking region-specific representations, often creating semantic gaps that inadvertently induce hallucination, where models invent plausible but incorrect objects. In this work, we demonstrate that hallucination arises not from object absence per se, but from semantic discontinuity introduced by such suppression-based attacks. We propose a new class of \emph{background-consistent object concealment} attacks, which hide target objects by re-encoding their visual representations to be statistically and semantically consistent with surrounding background regions. Crucially, our approach preserves token structure and attention flow, avoiding representational voids that trigger hallucination. We present a pixel-level optimization framework that enforces background-consistent re-encoding across multiple transformer layers while preserving global scene semantics. Extensive experiments on state-of-the-art vision-language models show that our method effectively conceals target objects while preserving up to $86\%$ of non-target objects and reducing grounded hallucination by up to $3\times$ compared to attention-suppression-based attacks.