ByteDanceHelmholtzJun 8, 2026arXiv:2606.09401

Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models

Bartłomiej Marek, Lorenzo Rossi, Vincent Hanke, Xun Wang, Michael Backes, Franziska Boenisch, Adam Dziedzic

AI Summary

This study benchmarks the empirical privacy risks associated with differential privacy adaptations of large language models (LLMs) by employing robust membership inference and canary data extraction attacks. It reveals that the distribution of adaptation data significantly impacts privacy vulnerabilities, showing that closer alignment with pretraining data increases risks, even without direct overlaps. Notably, parameter-efficient fine-tuning methods like LoRA provide superior privacy protection for out-of-distribution data, offering critical insights for deploying LLMs in sensitive applications.

Key Contribution

Distribution shifts in adaptation data can amplify privacy risks in LLMs, challenging the effectiveness of differential privacy guarantees.

Abstract

Recent work has applied differential privacy (DP) to adapt large language models (LLMs) for sensitive applications, offering theoretical guarantees. However, its practical effectiveness remains unclear, partly due to LLM pretraining, where overlaps and interdependencies with adaptation data can undermine privacy despite DP efforts. To analyze this issue in practice, we investigate privacy risks under DP adaptations in LLMs using state-of-the-art attacks such as robust membership inference and canary data extraction. We benchmark these risks by systematically varying the adaptation data distribution, from exact overlaps with pretraining data, through in-distribution (IID) cases, to entirely out-of-distribution (OOD) examples. Additionally, we evaluate how different adaptation methods and different privacy regimes impact the vulnerability. Our results show that distribution shifts strongly influence privacy vulnerability: the closer the adaptation data is to the pretraining distribution, the higher the practical privacy risk at the same theoretical guarantee, even without direct data overlap. We find that parameter-efficient fine-tuning methods, such as LoRA, achieve the highest empirical privacy protection for OOD data. Our benchmark identifies key factors for achieving practical privacy in DP LLM adaptation, providing actionable insights for deploying customized models in sensitive settings. Looking forward, we propose a structured framework for holistic privacy assessment beyond adaptation privacy, to identify and evaluate risks across the full pretrain-adapt pipeline of LLMs.

Data Curation & Synthetic Data Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models

Related Papers