May 5, 2026arXiv:2605.04116

Membership Inference Attacks for Retrieval Based In-Context Learning for Document Question Answering

Tejas D. Kulkarni, Antti Koskela, Laith Zumot

AI Summary

This paper investigates membership inference attacks against retrieval-augmented in-context learning systems for document question answering, demonstrating vulnerability even with separate service providers and users. Two novel black-box attacks are proposed: one uses a reference model to estimate loss, and the other employs a weighted-averaging scheme to compute a membership statistic without a reference model. Experiments show these attacks outperform existing methods, even with paraphrased queries, and an ensemble prompting defense is shown to mitigate the privacy leakage.

Key Contribution

Retrieval-augmented in-context learning, despite its benefits, leaks surprising amounts of private data, even when attackers only have access to paraphrased queries.

Abstract

We show that remotely hosted applications employing in-context learning when augmented with a retrieval function to select in-context examples can be vulnerable to membership-inference attacks even when the service provider and users are separate parties. We propose two black-box membership inference attacks that exploit query text prefixes to distinguish member from non-member inputs. The first attack uses a reference model to estimate an otherwise unavailable loss metric. The second attack improves upon it by eliminating the reference model and instead computing a membership statistic through a simple but novel weighted-averaging scheme. Our comprehensive empirical evaluations consider a stricter case in which the adversary has a paraphrased version of the text in the queries and show that our attacks can exhibit stronger resilience to paraphrasing and outperform three prior attacks in many cases with small number of prefixes. We also adapt an existing ensemble prompting defense to our setting, demonstrating that it substantially mitigates the privacy leakage caused by our second attack.

Natural Language Processing Recommendation & Information Retrieval Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References39

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Membership Inference Attacks for Retrieval Based In-Context Learning for Document Question Answering

Related Papers