Stanford HAIAI Sec LabBeijing Chaitin Technology CoBeijing University of Post and TelecommunicationsCASNorth China Electric Power UniversityApr 12, 2026arXiv:2604.10717

Detecting RAG Extraction Attack via Dual-Path Runtime Integrity Game

Yuanbo Xie, Yingjie Zhang, Yulin Li, Shouyou Song, Xiaokun Chen, Zhihan Liu, Liya Su, Tingwen Liu

AI Summary

The paper introduces CanaryRAG, a runtime defense mechanism against RAG knowledge base leakage attacks that leverages canary tokens embedded in retrieved chunks. It frames the defense as a dual-path runtime integrity game, monitoring both target and oracle paths for canary behavior violations to detect leakage attempts. Experiments show CanaryRAG significantly reduces chunk recovery rates compared to existing methods, with minimal impact on performance and latency, and offers seamless integration into existing RAG pipelines.

Key Contribution

Canary tokens turn the tables on RAG extraction attacks, offering a plug-and-play runtime defense that detects leakage attempts with negligible performance overhead.

Abstract

Retrieval-Augmented Generation (RAG) systems augment large language models with external knowledge, yet introduce a critical security vulnerability: RAG Knowledge Base Leakage, wherein adversarial prompts can induce the model to divulge retrieved proprietary content. Recent studies reveal that such leakage can be executed through adaptive and iterative attack strategies (named RAG extraction attack), while effective countermeasures remain notably lacking. To bridge this gap, we propose CanaryRAG, a runtime defense mechanism inspired by stack canaries in software security. CanaryRAG embeds carefully designed canary tokens into retrieved chunks and reformulates RAG extraction defense as a dual-path runtime integrity game. Leakage is detected in real time whenever either the target or oracle path violates its expected canary behavior, including under adaptive suppression and obfuscation. Extensive evaluations against existing attacks demonstrate that CanaryRAG provides robust defense, achieving substantially lower chunk recovery rates than state-of-the-art baselines while imposing negligible impact on task performance and inference latency. Moreover, as a plug-and-play solution, CanaryRAG can be seamlessly integrated into arbitrary RAG pipelines without requiring retraining or structural modifications, offering a practical and scalable safeguard for proprietary data.

Natural Language Processing Recommendation & Information Retrieval Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Detecting RAG Extraction Attack via Dual-Path Runtime Integrity Game

Related Papers