HKUInstitute of Artificial IntelligenceNorthwesternTelecomMay 21, 2026arXiv:2605.22060

Safeguarding Text-to-Image Generative Models Against Unauthorized Knowledge Distillation

Yilan Gao, Sida Huang, Hongyuan Zhang, Xuelong Li

AI Summary

The paper introduces WaveGuard, a defense mechanism against unauthorized knowledge distillation in text-to-image generative models deployed via APIs. WaveGuard injects frequency-aware, imperceptible perturbations into generated images, controlled by a user-specified perturbation budget. Experiments on WikiArt-related synthetic outputs demonstrate that WaveGuard effectively reduces the utility of protected images for training unauthorized student models while maintaining visual fidelity and scaling efficiently.

Key Contribution

Imperceptible frequency-aware perturbations can effectively thwart model stealing in text-to-image generative models without sacrificing visual quality.

Abstract

Closed-weight generative services are increasingly deployed through query-based APIs, where users can obtain generated outputs while model parameters remain inaccessible. However, such deployment does not prevent model stealing: an attacker can repeatedly query the service, collect large volumes of released synthetic images, and use them as training data for a private substitute model. This query-output-driven process enables unauthorized knowledge distillation and capability replication without direct access to the original weights. To mitigate this threat, a practical defense should preserve the visual fidelity of released images, provide explicit control over perturbation magnitude, and scale efficiently to large-volume output release. We present WaveGuard, a single-pass, generator-based protection framework that safeguards released synthetic images under a user-specified perturbation budget. WaveGuard employs a frequency-aware perturbation generator to inject structured, imperceptible perturbations that maintain perceptual utility for benign viewers while reducing the usefulness of protected images as training data for unauthorized student models. Extensive experiments under WikiArt-related synthetic-output distillation settings show that WaveGuard achieves a favorable efficacy--fidelity--efficiency trade-off, with explicit imperceptibility control and substantial gains in protection efficiency.

Data Curation & Synthetic Data Inference & Quantization Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Safeguarding Text-to-Image Generative Models Against Unauthorized Knowledge Distillation

Related Papers