TU DarmstadtMar 4, 2026arXiv:2603.04064

Tuning Just Enough: Lightweight Backdoor Attacks on Multi-Encoder Diffusion Models

Ziyuan Chen, Yujin Jeong, Tobias Braun, Anna Rohrbach

AI Summary

This paper investigates backdoor vulnerabilities in Stable Diffusion 3, a multi-encoder text-to-image diffusion model, by defining four attack target categories and identifying the minimal encoder sets required for each. They introduce Multi-Encoder Lightweight aTtacks (MELT), which trains only low-rank adapters on frozen text encoders, achieving effective backdoors with minimal parameter tuning. The study reveals that tuning fewer than 0.2% of the total encoder parameters is sufficient for successful backdoor attacks, highlighting a significant vulnerability in multi-encoder diffusion models.

Key Contribution

Despite the complexity of Stable Diffusion 3's multi-encoder architecture, a backdoor can be implanted by tuning just 0.2% of the encoder parameters.

Abstract

As text-to-image diffusion models become increasingly deployed in real-world applications, concerns about backdoor attacks have gained significant attention. Prior work on text-based backdoor attacks has largely focused on diffusion models conditioned on a single lightweight text encoder. However, more recent diffusion models that incorporate multiple large-scale text encoders remain underexplored in this context. Given the substantially increased number of trainable parameters introduced by multiple text encoders, an important question is whether backdoor attacks can remain both efficient and effective in such settings. In this work, we study Stable Diffusion 3, which uses three distinct text encoders and has not yet been systematically analyzed for text-encoder-based backdoor vulnerabilities. To understand the role of text encoders in backdoor attacks, we define four categories of attack targets and identify the minimal sets of encoders required to achieve effective performance for each attack objective. Based on this, we further propose Multi-Encoder Lightweight aTtacks (MELT), which trains only low-rank adapters while keeping the pretrained text encoder weight frozen. We demonstrate that tuning fewer than 0.2% of the total encoder parameters is sufficient for successful backdoor attacks on Stable Diffusion 3, revealing previously underexplored vulnerabilities in practical attack scenarios in multi-encoder settings.

Computer Vision Multimodal Models Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Tuning Just Enough: Lightweight Backdoor Attacks on Multi-Encoder Diffusion Models

Related Papers