Tsinghua AIBUPTMay 26, 2026arXiv:2605.27020

Black-box Membership Inference Attacks on the Pre-training Data of Image-generation Models

Tao Qi, Huili Wang, Yuanhong Huang, Wendan Wang, Lianchao Zhao, Jinrui Wang, Zichen Qin, Shangguang Wang, Yongfeng Huang

AI Summary

This paper introduces SD-MIA, a black-box membership inference attack framework, to detect unauthorized data usage in diffusion models' pre-training data. SD-MIA leverages a cross-modal data perturbation mechanism, analyzing how the model denoises a target image and perturbed textual instructions to reveal membership cues. Experiments on public and newly constructed datasets demonstrate SD-MIA's superior performance compared to existing baselines, even those with access to internal model features.

Key Contribution

You can now detect unauthorized training data in closed-source image generation models with higher accuracy, even without access to internal model features.

Abstract

The rapid advancement of diffusion-based image generation models has raised serious concerns regarding potential copyright and privacy infringements involving human-created data. Membership inference attacks (MIAs) have emerged as a promising tool for identifying unauthorized data usage during model training. Existing methods typically assess the ability of model to denoise perturbed suspect images as an indicator of membership status. However, the discriminative power of such features is highly dependent on the degree of model memorization and deteriorates significantly when applied to less exposed data (e.g., pre-training data). Although several methods attempt to enhance detection by leveraging internal model features, these features are generally inaccessible in mainstream closed-source image generation platforms, limiting their practicality. In this paper, we demonstrate that analyzing how a black-box diffusion model denoises a target image and corresponding perturbed textual instructions can reveal more distinctive membership cues. Based on this insight, we propose a black-box membership inference attack framework (named SD-MIA) that leverages a cross-modal data perturbation mechanism to detect pre-training data in diffusion models. We conduct extensive experiments on both a public benchmark dataset and a newly constructed dataset, each comprising pre-training membership and non-membership samples with identical distributions. Experimental results demonstrate that SD-MIA achieves superior performance compared to existing baselines, including those with the unfair advantage of accessing internal model features.

Computer Vision Data Curation & Synthetic Data Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Black-box Membership Inference Attacks on the Pre-training Data of Image-generation Models

Related Papers