Feb 26, 2026arXiv:2602.22610

DP-aware AdaLN-Zero: Taming Conditioning-Induced Heavy-Tailed Gradients in Differentially Private Diffusion

Tao Huang, Jiayang Meng, Xu Yang, Xu Yang, Chen Hou, Chen Hou, Hong Chen

AI Summary

The paper addresses the problem of heavy-tailed gradients induced by heterogeneous conditional contexts in differentially private (DP) diffusion models, which leads to excessive clipping and degraded utility under DP-SGD. To mitigate this, they propose DP-aware AdaLN-Zero, a sensitivity-aware conditioning mechanism that constrains conditioning representation magnitude and AdaLN modulation parameters via bounded re-parameterization. This approach reduces extreme gradient tail events before clipping and noise injection, leading to improved performance in interpolation/imputation and forecasting tasks under DP constraints.

Key Contribution

By bounding conditioning representation magnitude, DP-aware AdaLN-Zero tames heavy-tailed gradients in differentially private diffusion models, leading to improved performance under strict privacy budgets.

Abstract

Condition injection enables diffusion models to generate context-aware outputs, which is essential for many time-series tasks. However, heterogeneous conditional contexts (e.g., observed history, missingness patterns or outlier covariates) can induce heavy-tailed per-example gradients. Under Differentially Private Stochastic Gradient Descent (DP-SGD), these rare conditioning-driven heavy-tailed gradients disproportionately trigger global clipping, resulting in outlier-dominated updates, larger clipping bias, and degraded utility under a fixed privacy budget. In this paper, we propose DP-aware AdaLN-Zero, a drop-in sensitivity-aware conditioning mechanism for conditional diffusion transformers that limits conditioning-induced gain without modifying the DP-SGD mechanism. DP-aware AdaLN-Zero jointly constrains conditioning representation magnitude and AdaLN modulation parameters via bounded re-parameterization, suppressing extreme gradient tail events before gradient clipping and noise injection. Empirically, DP-SGD equipped with DP-aware AdaLN-Zero improves interpolation/imputation and forecasting under matched privacy settings. We observe consistent gains on a real-world power dataset and two public ETT benchmarks over vanilla DP-SGD. Moreover, gradient diagnostics attribute these improvements to conditioning-specific tail reshaping and reduced clipping distortion, while preserving expressiveness in non-private training. Overall, these results show that sensitivity-aware conditioning can substantially improve private conditional diffusion training without sacrificing standard performance.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References18

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

DP-aware AdaLN-Zero: Taming Conditioning-Induced Heavy-Tailed Gradients in Differentially Private Diffusion

Related Papers