Search papers, labs, and topics across Lattice.
This paper introduces a method for differentially private (DP) data release in exponential families by releasing DP sufficient statistics and performing noise-calibrated likelihood-based inference. They derive asymptotic normality results, explicit variance inflation, and valid Wald-style confidence intervals for the plug-in DP MLE, and further propose a noise-aware likelihood correction for improved uncertainty quantification. A matching minimax lower bound is established, demonstrating the fundamental privacy-utility tradeoff, and the approach is validated empirically on census data and three exponential families.
Unlock statistically valid inference from differentially private data by releasing only sufficient statistics, sidestepping the miscalibration issues of synthetic data and the uncertainty quantification limitations of point estimates.
Many differentially private (DP) data release systems either output DP synthetic data and leave analysts to perform inference as usual, which can lead to severe miscalibration, or output a DP point estimate without a principled way to do uncertainty quantification. This paper develops a clean and tractable middle ground for exponential families: release only DP sufficient statistics, then perform noise-calibrated likelihood-based inference and optional parametric synthetic data generation as post-processing. Our contributions are: (1) a general recipe for approximate-DP release of clipped sufficient statistics under the Gaussian mechanism; (2) asymptotic normality, explicit variance inflation, and valid Wald-style confidence intervals for the plug-in DP MLE; (3) a noise-aware likelihood correction that is first-order equivalent to the plug-in but supports bootstrap-based intervals; and (4) a matching minimax lower bound showing the privacy distortion rate is unavoidable. The resulting theory yields concrete design rules and a practical pipeline for releasing DP synthetic data with principled uncertainty quantification, validated on three exponential families and real census data.