Search papers, labs, and topics across Lattice.
The paper introduces CanaryRAG, a runtime defense mechanism against RAG knowledge base leakage attacks that leverages canary tokens embedded in retrieved chunks. It frames the defense as a dual-path runtime integrity game, monitoring both target and oracle paths for canary behavior violations to detect leakage attempts. Experiments show CanaryRAG significantly reduces chunk recovery rates compared to existing methods, with minimal impact on performance and latency, and offers seamless integration into existing RAG pipelines.
Canary tokens turn the tables on RAG extraction attacks, offering a plug-and-play runtime defense that detects leakage attempts with negligible performance overhead.
Retrieval-Augmented Generation (RAG) systems augment large language models with external knowledge, yet introduce a critical security vulnerability: RAG Knowledge Base Leakage, wherein adversarial prompts can induce the model to divulge retrieved proprietary content. Recent studies reveal that such leakage can be executed through adaptive and iterative attack strategies (named RAG extraction attack), while effective countermeasures remain notably lacking. To bridge this gap, we propose CanaryRAG, a runtime defense mechanism inspired by stack canaries in software security. CanaryRAG embeds carefully designed canary tokens into retrieved chunks and reformulates RAG extraction defense as a dual-path runtime integrity game. Leakage is detected in real time whenever either the target or oracle path violates its expected canary behavior, including under adaptive suppression and obfuscation. Extensive evaluations against existing attacks demonstrate that CanaryRAG provides robust defense, achieving substantially lower chunk recovery rates than state-of-the-art baselines while imposing negligible impact on task performance and inference latency. Moreover, as a plug-and-play solution, CanaryRAG can be seamlessly integrated into arbitrary RAG pipelines without requiring retraining or structural modifications, offering a practical and scalable safeguard for proprietary data.