BaselUniversity of ZürichMar 3, 2026arXiv:2603.03226

Adaptive Methods Are Preferable in High Privacy Settings: An SDE Perspective

Enea Monzio Compagnoni, Alessandro Stanghellini, R. Islamov, Aurélien Lucchi, Aurelien Lucchi, A.A. Koloskova, Anastasiia Koloskova

AI Summary

This paper analyzes the interaction between differential privacy (DP) noise and adaptive optimization methods using stochastic differential equations (SDEs). It shows that DP-SignSGD outperforms DP-SGD in high-privacy regimes due to its convergence speed being less sensitive to the privacy parameter ε. Furthermore, the study reveals that adaptive methods like DP-SignSGD and DP-Adam are more practical than DP-SGD because their optimal learning rates are less dependent on ε, requiring less hyperparameter tuning across different privacy levels.

Key Contribution

Forget tedious hyperparameter tuning: adaptive optimizers like DP-SignSGD and DP-Adam maintain performance across privacy levels, unlike DP-SGD whose learning rate plummets with increased privacy.

Abstract

Differential Privacy (DP) is becoming central to large-scale training as privacy regulations tighten. We revisit how DP noise interacts with adaptivity in optimization through the lens of stochastic differential equations, providing the first SDE-based analysis of private optimizers. Focusing on DP-SGD and DP-SignSGD under per-example clipping, we show a sharp contrast under fixed hyperparameters: DP-SGD converges at a Privacy-Utility Trade-Off of $\mathcal{O}(1/\varepsilon^2)$ with speed independent of $\varepsilon$, while DP-SignSGD converges at a speed linear in $\varepsilon$ with an $\mathcal{O}(1/\varepsilon)$ trade-off, dominating in high-privacy or large batch noise regimes. By contrast, under optimal learning rates, both methods achieve comparable theoretical asymptotic performance; however, the optimal learning rate of DP-SGD scales linearly with $\varepsilon$, while that of DP-SignSGD is essentially $\varepsilon$-independent. This makes adaptive methods far more practical, as their hyperparameters transfer across privacy levels with little or no re-tuning. Empirical results confirm our theory across training and test metrics, and empirically extend from DP-SignSGD to DP-Adam.

Constitutional AI & AI Ethics Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References77

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Adaptive Methods Are Preferable in High Privacy Settings: An SDE Perspective

Related Papers