Search papers, labs, and topics across Lattice.
HeteroFedSyn is introduced as the first differentially private tabular data synthesis framework tailored for horizontal federated settings with heterogeneous data distributions. It builds upon the PrivSyn paradigm, incorporating innovations like an L2-based dependency metric with random projection, an unbiased estimator for multiplicative noise correction, and an adaptive selection strategy for distributed marginal selection. Experiments demonstrate that HeteroFedSyn achieves utility comparable to centralized synthesis, despite the increased noise in federated execution.
Federated differentially private data synthesis can now achieve utility comparable to centralized approaches, even with heterogeneous data distributions, thanks to a novel framework that smartly handles noise and redundancy.
Traditional Differential Privacy (DP) mechanisms are typically tailored to specific analysis tasks, which limits the reusability of protected data. DP tabular data synthesis overcomes this by generating synthetic datasets that can be shared for arbitrary downstream tasks. However, existing synthesis methods predominantly assume centralized or local settings and overlook the more practical horizontal federated scenario. Naively synthesizing data locally or perturbing individual records either produces biased mixtures or introduces excessive noise, especially under heterogeneous data distributions across participants. We propose HeteroFedSyn, the first DP tabular data synthesis framework designed specifically for the horizontal federated setting. Built upon the PrivSyn paradigm of 2-way marginal-based synthesis, HeteroFedSyn introduces three key innovations for distributed marginal selection: (i) an L2-based dependency metric with random projection for noise-efficient correlation measurement, (ii) an unbiased estimator to correct multiplicative noise, and (iii) an adaptive selection strategy that dynamically updates dependency scores to avoid redundancy. Extensive experiments on range queries, Wasserstein fidelity, and machine learning tasks show that, despite the increased noise inherent to federated execution, HeteroFedSyn achieves utility comparable to centralized synthesis. Our code is open-sourced via the link.