TU DarmstadtMar 9, 2026arXiv:2603.08207

The Conundrum of Trustworthy Research on Attacking Personally Identifiable Information Removal Techniques

AI Summary

This paper analyzes the evaluation methodologies used in reconstruction attacks against PII removal techniques, finding significant issues with data leakage and contamination that likely overestimate attack success. They argue that only evaluations using truly private data can objectively assess the vulnerabilities of PII removal techniques. However, the inaccessibility of such data poses a fundamental challenge to transparent, reproducible, and trustworthy research in this area, creating a conundrum for the research community.

Key Contribution

Reported successes in reconstructing PII from sanitized documents may be overstated due to data leakage, leaving the true vulnerability of PII removal techniques uncertain.

Abstract

Removing personally identifiable information (PII) from texts is necessary to comply with various data protection regulations and to enable data sharing without compromising privacy. However, recent works show that documents sanitized by PII removal techniques are vulnerable to reconstruction attacks. Yet, we suspect that the reported success of these attacks is largely overestimated. We critically analyze the evaluation of existing attacks and find that data leakage and data contamination are not properly mitigated, leaving the question whether or not PII removal techniques truly protect privacy in real-world scenarios unaddressed. We investigate possible data sources and attack setups that avoid data leakage and conclude that only truly private data can allow us to objectively evaluate vulnerabilities in PII removal techniques. However, access to private data is heavily restricted - and for good reasons - which also means that the public research community cannot address this problem in a transparent, reproducible, and trustworthy manner.

Data Curation & Synthetic Data Natural Language Processing Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References116

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

The Conundrum of Trustworthy Research on Attacking Personally Identifiable Information Removal Techniques

Related Papers