Search papers, labs, and topics across Lattice.
This paper introduces Image-based Prompt Injection (IPI), a black-box attack that embeds adversarial instructions within images to manipulate Multimodal Large Language Models (MLLMs). The IPI pipeline uses segmentation, adaptive font scaling, and background-aware rendering to create visually concealed prompts. Experiments on GPT-4-turbo with COCO dataset show IPI achieves up to 64% attack success, demonstrating a practical vulnerability in MLLMs.
Multimodal LLMs can be hijacked by adversarial instructions hidden inside seemingly innocuous images, achieving a 64% success rate in manipulating model outputs.
Multimodal Large Language Models (MLLMs) integrate vision and text to power applications, but this integration introduces new vulnerabilities. We study Image-based Prompt Injection (IPI), a black-box attack in which adversarial instructions are embedded into natural images to override model behavior. Our end-to-end IPI pipeline incorporates segmentation-based region selection, adaptive font scaling, and background-aware rendering to conceal prompts from human perception while preserving model interpretability. Using the COCO dataset and GPT-4-turbo, we evaluate 12 adversarial prompt strategies and multiple embedding configurations. The results show that IPI can reliably manipulate the output of the model, with the most effective configuration achieving up to 64% attack success under stealth constraints. These findings highlight IPI as a practical threat in black-box settings and underscore the need for defenses against multimodal prompt injection.