Search papers, labs, and topics across Lattice.
This paper introduces Realistic and Privacy-Preserving Synthetic Data Generation (RPSG), a method for generating synthetic text data from private seeds using differentially private LLMs. RPSG balances data fidelity and privacy protection by incorporating formal differential privacy mechanisms during the generation process. Experiments demonstrate that RPSG outperforms existing private synthetic data generation methods in achieving both high fidelity and strong privacy guarantees.
Achieve strong privacy protection while generating realistic synthetic data from private seeds using differentially private LLMs, outperforming existing methods.
Large language models (LLMs) have emerged as a powerful tool for synthetic data generation. A particularly important use case is producing synthetic replicas of private text, which requires carefully balancing privacy and utility. We propose Realistic and Privacy-Preserving Synthetic Data Generation (RPSG), which leverages privacy-preserving mechanisms, including formal differential privacy (DP); and private seeds, in particular text containing personal information, to generate realistic synthetic data. Comprehensive experiments against state-of-the-art private synthetic data generation methods demonstrate that RPSG achieves high fidelity to private data while providing strong privacy protection.