Search papers, labs, and topics across Lattice.
The paper demonstrates that simply detecting poisoned documents in RAG systems is insufficient to prevent adversarial manipulation of generated outputs due to a "monitoring-control gap". To address this, they introduce CORDON-MAS, a compartmentalized RAG architecture that enforces information-flow control by separating evidence extraction, audit, and synthesis into distinct agents with restricted memory access. Experiments across five BEIR datasets show that CORDON-MAS significantly reduces attack success rates by 92.4% compared to standard RAG.
Even when RAG models detect poisoned information, they still act on it, but a new architecture can close this "monitoring-control gap" and slash attack success by 92%.
Retrieval-augmented generation (RAG) increasingly underpins high-stakes applications, yet remains vulnerable to Confundo-style poisoning where adversarially optimized documents manipulate generated outputs. Existing defenses assume that detecting poisoned evidence prevents harm. We show this assumption is incorrect: models exhibit a monitoring-control gap -- they can detect contradictions in retrieved evidence yet still act on poisoned claims. We introduce the Cordon Principle -- no agent capable of final synthesis may access untrusted natural-language evidence -- and realize it through CORDON-MAS, a compartmentalized framework that enforces this principle architecturally by separating evidence extraction, cross-source audit, and answer synthesis into agents with asymmetric memory privileges. Across five BEIR datasets, CORDON-MAS reduces attack success rate by 92.4\% relative to undefended RAG. This reframes RAG poisoning from a detection problem to an information-flow control problem.