Search papers, labs, and topics across Lattice.
This paper investigates the vulnerability of multimodal memory in web agents to poisoning attacks, introducing MemVenom, a novel black-box attack framework. The method employs a two-stage design that combines trigger-conditioned retrieval and post-retrieval induction to effectively inject malicious content into external memory, influencing agent behavior without altering model parameters. Experimental results show that MemVenom can achieve up to 99.15% attack success on GPT-5-family web agents while maintaining benign performance, highlighting a significant security risk in current web-agent systems.
Malicious content can be injected into web agents' memories, achieving up to 99.15% attack success without affecting their benign performance.
External memory has become a core component of modern web agents, enabling long-horizon reasoning through the retrieval of past experiences. However, this paradigm introduces a critical vulnerability: malicious content injected into memory can be persistently recalled and repeatedly influence agent behavior. In this work, we identify and systematically study multimodal memory poisoning, an overlooked yet practical attack surface in web-agent systems. We propose MemVenom, a unified black-box attack framework that poisons graph-structured external memory with coordinated text-image evidence. Our method consists of a two-stage design: (1) a trigger-conditioned retrieval attack that ensures high-probability recall of malicious memory, and (2) a post-retrieval attack induction that leverages adversarial perturbations and stealthy OCR injection to override the original user objective. Unlike prior attacks that operate on prompts or text-only memory, our approach enables persistent, reusable, and goal-agnostic attacks without modifying model parameters or re-optimizing malicious tasks. Experiments across multiple web-agent frameworks and vision-language models demonstrate that MemVenom achieves strong end-to-end attack success with minimal impact on benign performance, reaching up to 99.15% on GPT-5-family web agents, while transferring effectively across architectures and model scales.