Feb 26, 2026arXiv:2602.22611

Mitigating Membership Inference in Intermediate Representations via Layer-wise MIA-risk-aware DP-SGD

Jiayang Meng, Tao Huang, Chen Hou, Chen Hou, Guolong Zheng, Guolong Zheng, Hong Chen

AI Summary

The paper introduces Layer-wise MIA-risk-aware DP-SGD (LM-DP-SGD), a novel differentially private training method that adaptively allocates privacy protection across different layers of a neural network based on their vulnerability to membership inference attacks (MIAs). LM-DP-SGD estimates layer-specific MIA risk by training shadow models and fitting MIA adversaries on their intermediate representations, then uses these risk estimates to reweight each layer's contribution to the globally clipped gradient during DP-SGD training. Experiments demonstrate that LM-DP-SGD achieves a better privacy-utility trade-off compared to standard DP-SGD by reducing peak IR-level MIA risk while maintaining model utility under the same privacy budget, and theoretical guarantees on privacy and convergence are provided.

Key Contribution

By strategically allocating differential privacy based on layer-specific vulnerability, LM-DP-SGD significantly improves the privacy-utility trade-off against membership inference attacks on intermediate representations.

Abstract

In Embedding-as-an-Interface (EaaI) settings, pre-trained models are queried for Intermediate Representations (IRs). The distributional properties of IRs can leak training-set membership signals, enabling Membership Inference Attacks (MIAs) whose strength varies across layers. Although Differentially Private Stochastic Gradient Descent (DP-SGD) mitigates such leakage, existing implementations employ per-example gradient clipping and a uniform, layer-agnostic noise multiplier, ignoring heterogeneous layer-wise MIA vulnerability. This paper introduces Layer-wise MIA-risk-aware DP-SGD (LM-DP-SGD), which adaptively allocates privacy protection across layers in proportion to their MIA risk. Specifically, LM-DP-SGD trains a shadow model on a public shadow dataset, extracts per-layer IRs from its train/test splits, and fits layer-specific MIA adversaries, using their attack error rates as MIA-risk estimates. Leveraging the cross-dataset transferability of MIAs, these estimates are then used to reweight each layer's contribution to the globally clipped gradient during private training, providing layer-appropriate protection under a fixed noise magnitude. We further establish theoretical guarantees on both privacy and convergence of LM-DP-SGD. Extensive experiments show that, under the same privacy budget, LM-DP-SGD reduces the peak IR-level MIA risk while preserving utility, yielding a superior privacy-utility trade-off.

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References34

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Mitigating Membership Inference in Intermediate Representations via Layer-wise MIA-risk-aware DP-SGD

Related Papers