ByteDanceNorthern Arizona UniversityMar 4, 2026arXiv:2603.03637

Image-based Prompt Injection: Hijacking Multimodal LLMs through Visually Embedded Adversarial Instructions

N. Nagaraja, Neha Nagaraja, Lan Zhang, Zhilong Wang, Bo Zhang, Pawan Patil

AI Summary

This paper introduces Image-based Prompt Injection (IPI), a black-box attack that embeds adversarial instructions within images to manipulate Multimodal Large Language Models (MLLMs). The IPI pipeline uses segmentation, adaptive font scaling, and background-aware rendering to create visually concealed prompts. Experiments on GPT-4-turbo with COCO dataset show IPI achieves up to 64% attack success, demonstrating a practical vulnerability in MLLMs.

Key Contribution

Multimodal LLMs can be hijacked by adversarial instructions hidden inside seemingly innocuous images, achieving a 64% success rate in manipulating model outputs.

Abstract

Multimodal Large Language Models (MLLMs) integrate vision and text to power applications, but this integration introduces new vulnerabilities. We study Image-based Prompt Injection (IPI), a black-box attack in which adversarial instructions are embedded into natural images to override model behavior. Our end-to-end IPI pipeline incorporates segmentation-based region selection, adaptive font scaling, and background-aware rendering to conceal prompts from human perception while preserving model interpretability. Using the COCO dataset and GPT-4-turbo, we evaluate 12 adversarial prompt strategies and multiple embedding configurations. The results show that IPI can reliably manipulate the output of the model, with the most effective configuration achieving up to 64% attack success under stealth constraints. These findings highlight IPI as a practical threat in black-box settings and underscore the need for defenses against multimodal prompt injection.

Computer Vision Multimodal Models Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References28

Year2025

Venue2025 3rd International Conference on Foundation and Large Language Models (FLLM)

Related Papers

Finding related papers...

Search

Image-based Prompt Injection: Hijacking Multimodal LLMs through Visually Embedded Adversarial Instructions

Related Papers