Search papers, labs, and topics across Lattice.
The paper introduces MCP-GUARD, a three-stage defense framework designed to secure LLM-tool interactions against vulnerabilities introduced by protocols like MCP. MCP-GUARD employs static scanning, a deep neural detector, and a fine-tuned E5-based model, achieving 96.01% accuracy in identifying adversarial prompts, culminating in an LLM arbitrator for final decision-making. To facilitate research, the authors also present MCP-ATTACKBENCH, a benchmark of 70,448 samples simulating real-world attacks on MCP-based systems.
LLM-tool integrations are riddled with security holes, but MCP-GUARD offers a practical, multi-layered defense that achieves 96% accuracy in detecting adversarial prompts.
While Large Language Models (LLMs) have achieved remarkable performance, they remain vulnerable to jailbreak. The integration of Large Language Models (LLMs) with external tools via protocols such as the Model Context Protocol (MCP) introduces critical security vulnerabilities, including prompt injection, data exfiltration, and other threats. To counter these challenges, we propose MCP-GUARD, a robust, layered defense architecture designed for LLM-tool interactions. MCP-GUARD employs a three-stage detection pipeline that balances efficiency with accuracy: it progresses from lightweight static scanning for overt threats and a deep neural detector for semantic attacks, to our fine-tuned E5-based model which achieves 96.01\% accuracy in identifying adversarial prompts. Finally, an LLM arbitrator synthesizes these signals to deliver the final decision. To enable rigorous training and evaluation, we introduce MCP-ATTACKBENCH, a comprehensive benchmark comprising 70,448 samples augmented by GPT-4. This benchmark simulates diverse real-world attack vectors that circumvent conventional defenses in the MCP paradigm, thereby laying a solid foundation for future research on securing LLM-tool ecosystems.