Aug 14, 2025arXiv:2508.10991

MCP-Guard: A Multi-Stage Defense-in-Depth Framework for Securing Model Context Protocol in Agentic AI

Wenpeng Xing, Zhonghao Qi, Yupeng Qin, Yilin Li, Caini Chang, Jiahui Yu, Changting Lin, Zhenzhen Xie, Meng Han

AI Summary

The paper introduces MCP-GUARD, a three-stage defense framework designed to secure LLM-tool interactions against vulnerabilities introduced by protocols like MCP. MCP-GUARD employs static scanning, a deep neural detector, and a fine-tuned E5-based model, achieving 96.01% accuracy in identifying adversarial prompts, culminating in an LLM arbitrator for final decision-making. To facilitate research, the authors also present MCP-ATTACKBENCH, a benchmark of 70,448 samples simulating real-world attacks on MCP-based systems.

Key Contribution

LLM-tool integrations are riddled with security holes, but MCP-GUARD offers a practical, multi-layered defense that achieves 96% accuracy in detecting adversarial prompts.

Abstract

While Large Language Models (LLMs) have achieved remarkable performance, they remain vulnerable to jailbreak. The integration of Large Language Models (LLMs) with external tools via protocols such as the Model Context Protocol (MCP) introduces critical security vulnerabilities, including prompt injection, data exfiltration, and other threats. To counter these challenges, we propose MCP-GUARD, a robust, layered defense architecture designed for LLM-tool interactions. MCP-GUARD employs a three-stage detection pipeline that balances efficiency with accuracy: it progresses from lightweight static scanning for overt threats and a deep neural detector for semantic attacks, to our fine-tuned E5-based model which achieves 96.01\% accuracy in identifying adversarial prompts. Finally, an LLM arbitrator synthesizes these signals to deliver the final decision. To enable rigorous training and evaluation, we introduce MCP-ATTACKBENCH, a comprehensive benchmark comprising 70,448 samples augmented by GPT-4. This benchmark simulates diverse real-world attack vectors that circumvent conventional defenses in the MCP paradigm, thereby laying a solid foundation for future research on securing LLM-tool ecosystems.

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness Tool Use & Agents

Citation Metrics

Citations7

Influential citations1

References36

Year2025

VenueN/A

Related Papers

Finding related papers...

Search

MCP-Guard: A Multi-Stage Defense-in-Depth Framework for Securing Model Context Protocol in Agentic AI

Related Papers