Search papers, labs, and topics across Lattice.
The paper introduces DP-FedAdamW, a novel AdamW-based optimizer designed to address challenges in Differentially Private Federated Learning (DPFL) arising from data heterogeneity, privacy noise, and sensitivity to local overfitting. DP-FedAdamW stabilizes the variance of the second-moment estimator, removes DP-induced bias, and aligns local updates to the global descent, thereby mitigating client drift. Theoretical analysis demonstrates an unbiased second-moment estimator and a linearly accelerated convergence rate with tighter $(\varepsilon,δ)$-DP guarantees, while empirical results show significant performance improvements over existing methods on language and vision tasks.
AdamW, a popular optimizer for large models, can now be used in differentially private federated learning without sacrificing convergence speed or accuracy, thanks to a new bias-corrected and variance-stabilized variant.
Balancing convergence efficiency and robustness under Differential Privacy (DP) is a central challenge in Federated Learning (FL). While AdamW accelerates training and fine-tuning in large-scale models, we find that directly applying it to Differentially Private FL (DPFL) suffers from three major issues: (i) data heterogeneity and privacy noise jointly amplify the variance of second-moment estimator, (ii) DP perturbations bias the second-moment estimator, and (iii) DP amplify AdamW sensitivity to local overfitting, worsening client drift. We propose DP-FedAdamW, the first AdamW-based optimizer for DPFL. It restores AdamW under DP by stabilizing second-moment variance, removing DP-induced bias, and aligning local updates to the global descent to curb client drift. Theoretically, we establish an unbiased second-moment estimator and prove a linearly accelerated convergence rate without any heterogeneity assumption, while providing tighter $(\varepsilon,δ)$-DP guarantees. Our empirical results demonstrate the effectiveness of DP-FedAdamW across language and vision Transformers and ResNet-18. On Tiny-ImageNet (Swin-Base, $\varepsilon=1$), DP-FedAdamW outperforms the state-of-the-art (SOTA) by 5.83\%. The code is available in Appendix.