Google ResearchColumbiaWaterlooJun 1, 2026arXiv:2606.01849

ContinuousBench: Can Differentially Private Synthetic Text Improve Capabilities?

Peihan Liu, Lucas Rosenblatt, Weiwei Kong, Natalia Ponomareva, Gautam Kamath, Rachel Cummings, Roxana Geambasu, Yu Gan, Lillian Tsai, Alex Bie

AI Summary

This paper introduces ContinuousBench, a novel benchmark designed to evaluate the capability gains from differentially private (DP) synthetic text by pairing new training corpora with unsolvable QA sets that require the original data for resolution. The authors demonstrate that while non-private synthetic data can effectively transfer knowledge from the original corpus, state-of-the-art DP synthesis methods struggle to achieve similar results, even at high privacy levels (ε=100). This work highlights the limitations of current DP synthesis techniques in providing meaningful knowledge transfer, raising questions about their utility in sensitive data contexts.

Key Contribution

Non-private synthetic data can effectively transfer knowledge from original corpora, while state-of-the-art DP methods often fail to do so, even at high privacy levels.

Abstract

Differentially private (DP) text synthesis promises to unlock sensitive corpora for model training, but it remains unclear whether DP synthetic data transmits genuinely new knowledge and capabilities present only in those corpora. This is because existing evaluations rely on tasks that are nearly solvable without training, so strong benchmark performance does not establish that DP synthesis can substitute original data access. Thus, we introduce ContinuousBench, a continuously and automatically-regenerated benchmark that measures capability gain from DP synthetic text. Each quarter, a new release pairs a never-before-seen training corpus with a derived QA set, constructed to be: (1) unsolvable sans-corpus; and (2) learnable under DP, as the tested knowledge is supported by hundreds of independent records. Researchers produce DP synthetic data from the training corpus and run our standardized training and evaluation harness on their synthetic data to measure gains. We instantiate two tracks: Geminon, a procedurally-generated dataset about fictional creatures; and News, a stream of newly crawled public news articles. Although standard benchmarks are nearly saturated, on ContinuousBench we find that non-private synthesis transfers substantial knowledge from the original corpus, while state-of-the-art DP synthesis methods generally fail to do so, even at $\varepsilon=100$.

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

ContinuousBench: Can Differentially Private Synthetic Text Improve Capabilities?

Related Papers