Mar 5, 2026arXiv:2603.04881

Differential Privacy in Two-Layer Networks: How DP-SGD Harms Fairness and Robustness

AI Summary

This paper presents a feature-centric theoretical framework to analyze the impact of differentially private stochastic gradient descent (DP-SGD) on two-layer ReLU convolutional neural networks. The analysis uses the feature-to-noise ratio (FNR) to bound test loss and demonstrates that DP noise leads to suboptimal feature learning, causing disparate impact across classes/subpopulations, harming performance on semantically long-tailed data, and increasing vulnerability to adversarial attacks. The work also shows that public pre-training and private fine-tuning may not always improve performance, especially with feature distribution shifts.

Key Contribution

Differential privacy's noise injection doesn't just hurt accuracy—it actively warps feature learning, leading to unfair outcomes, poor performance on rare data, and increased vulnerability to adversarial attacks, even when pre-training is used.

Abstract

Differentially private learning is essential for training models on sensitive data, but empirical studies consistently show that it can degrade performance, introduce fairness issues like disparate impact, and reduce adversarial robustness. The theoretical underpinnings of these phenomena in modern, non-convex neural networks remain largely unexplored. This paper introduces a unified feature-centric framework to analyze the feature learning dynamics of differentially private stochastic gradient descent (DP-SGD) in two-layer ReLU convolutional neural networks. Our analysis establishes test loss bounds governed by a crucial metric: the feature-to-noise ratio (FNR). We demonstrate that the noise required for privacy leads to suboptimal feature learning, and specifically show that: 1) imbalanced FNRs across classes and subpopulations cause disparate impact; 2) even in the same class, noise has a greater negative impact on semantically long-tailed data; and 3) noise injection exacerbates vulnerability to adversarial attacks. Furthermore, our analysis reveals that the popular paradigm of public pre-training and private fine-tuning does not guarantee improvement, particularly under significant feature distribution shifts between datasets. Experiments on synthetic and real-world data corroborate our theoretical findings.

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References48

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Differential Privacy in Two-Layer Networks: How DP-SGD Harms Fairness and Robustness

Related Papers