Search papers, labs, and topics across Lattice.
Binjiang Institute of Zhejiang University
3
0
5
Even when RAG models detect poisoned information, they still act on it, but a new architecture can close this "monitoring-control gap" and slash attack success by 92%.
Aggregate benchmark scores can be misleading: models with statistically indistinguishable atomic knowledge can exhibit composition behavior differences exceeding 40 percentage points.
RAG systems can *know* the evidence contradicts their actions, yet still fail to act safely, revealing a dangerous monitoring-control gap that current evaluations miss.