Search papers, labs, and topics across Lattice.
This paper investigates membership inference attacks against retrieval-augmented in-context learning systems for document question answering, demonstrating vulnerability even with separate service providers and users. Two novel black-box attacks are proposed: one uses a reference model to estimate loss, and the other employs a weighted-averaging scheme to compute a membership statistic without a reference model. Experiments show these attacks outperform existing methods, even with paraphrased queries, and an ensemble prompting defense is shown to mitigate the privacy leakage.
Retrieval-augmented in-context learning, despite its benefits, leaks surprising amounts of private data, even when attackers only have access to paraphrased queries.
We show that remotely hosted applications employing in-context learning when augmented with a retrieval function to select in-context examples can be vulnerable to membership-inference attacks even when the service provider and users are separate parties. We propose two black-box membership inference attacks that exploit query text prefixes to distinguish member from non-member inputs. The first attack uses a reference model to estimate an otherwise unavailable loss metric. The second attack improves upon it by eliminating the reference model and instead computing a membership statistic through a simple but novel weighted-averaging scheme. Our comprehensive empirical evaluations consider a stricter case in which the adversary has a paraphrased version of the text in the queries and show that our attacks can exhibit stronger resilience to paraphrasing and outperform three prior attacks in many cases with small number of prefixes. We also adapt an existing ensemble prompting defense to our setting, demonstrating that it substantially mitigates the privacy leakage caused by our second attack.