Search papers, labs, and topics across Lattice.
This paper analyzes the interaction between differential privacy (DP) noise and adaptive optimization methods using stochastic differential equations (SDEs). It shows that DP-SignSGD outperforms DP-SGD in high-privacy regimes due to its convergence speed being less sensitive to the privacy parameter ε. Furthermore, the study reveals that adaptive methods like DP-SignSGD and DP-Adam are more practical than DP-SGD because their optimal learning rates are less dependent on ε, requiring less hyperparameter tuning across different privacy levels.
Forget tedious hyperparameter tuning: adaptive optimizers like DP-SignSGD and DP-Adam maintain performance across privacy levels, unlike DP-SGD whose learning rate plummets with increased privacy.
Differential Privacy (DP) is becoming central to large-scale training as privacy regulations tighten. We revisit how DP noise interacts with adaptivity in optimization through the lens of stochastic differential equations, providing the first SDE-based analysis of private optimizers. Focusing on DP-SGD and DP-SignSGD under per-example clipping, we show a sharp contrast under fixed hyperparameters: DP-SGD converges at a Privacy-Utility Trade-Off of $\mathcal{O}(1/\varepsilon^2)$ with speed independent of $\varepsilon$, while DP-SignSGD converges at a speed linear in $\varepsilon$ with an $\mathcal{O}(1/\varepsilon)$ trade-off, dominating in high-privacy or large batch noise regimes. By contrast, under optimal learning rates, both methods achieve comparable theoretical asymptotic performance; however, the optimal learning rate of DP-SGD scales linearly with $\varepsilon$, while that of DP-SignSGD is essentially $\varepsilon$-independent. This makes adaptive methods far more practical, as their hyperparameters transfer across privacy levels with little or no re-tuning. Empirical results confirm our theory across training and test metrics, and empirically extend from DP-SignSGD to DP-Adam.