Search papers, labs, and topics across Lattice.
ClawGuard is introduced as a runtime security framework to defend tool-augmented LLM agents against indirect prompt injection attacks, which exploit tool-returned content to inject malicious instructions. It enforces a user-confirmed rule set at each tool-call boundary, deriving task-specific access constraints from the user's stated objective. Experiments across five LLMs demonstrate that ClawGuard effectively blocks web, local content, MCP server, and skill file injection attacks without compromising agent utility or requiring model modification.
Stop relying on alignment for LLM agent security: ClawGuard offers deterministic protection against indirect prompt injection by enforcing user-defined rules at tool-call boundaries.
Tool-augmented Large Language Model (LLM) agents have demonstrated impressive capabilities in automating complex, multi-step real-world tasks, yet remain vulnerable to indirect prompt injection. Adversaries exploit this weakness by embedding malicious instructions within tool-returned content, which agents directly incorporate into their conversation history as trusted observations. This vulnerability manifests across three primary attack channels: web and local content injection, MCP server injection, and skill file injection. To address these vulnerabilities, we introduce \textsc{ClawGuard}, a novel runtime security framework that enforces a user-confirmed rule set at every tool-call boundary, transforming unreliable alignment-dependent defense into a deterministic, auditable mechanism that intercepts adversarial tool calls before any real-world effect is produced. By automatically deriving task-specific access constraints from the user's stated objective prior to any external tool invocation, \textsc{ClawGuard} blocks all three injection pathways without model modification or infrastructure change. Experiments across five state-of-the-art language models on AgentDojo, SkillInject, and MCPSafeBench demonstrate that \textsc{ClawGuard} achieves robust protection against indirect prompt injection without compromising agent utility. This work establishes deterministic tool-call boundary enforcement as an effective defense mechanism for secure agentic AI systems, requiring neither safety-specific fine-tuning nor architectural modification. Code is publicly available at https://github.com/Claw-Guard/ClawGuard.