Feb 25, 2026arXiv:2602.21977

When LoRA Betrays: Backdooring Text-to-Image Models by Masquerading as Benign Adapters

Liangwei Lyu, Jiaqi Xu, Jianwei Ding, Qiyao Deng

AI Summary

The paper introduces Masquerade-LoRA (MasqLoRA), a novel attack framework that exploits LoRA modules to inject backdoors into text-to-image diffusion models by training a standalone LoRA adapter on trigger word-target image pairs while freezing the base model. This method allows attackers to create LoRA modules that, when loaded and triggered by specific text, generate predefined images, while otherwise behaving normally, thus maintaining stealth. Experiments demonstrate MasqLoRA achieves a high attack success rate (99.8%) with minimal resource overhead, highlighting a significant vulnerability in the LoRA-based AI model sharing ecosystem.

Key Contribution

LoRA's plug-and-play simplicity makes text-to-image models dangerously vulnerable: a single, seemingly benign LoRA module can be a stealthy backdoor with near-perfect success.

Abstract

Low-Rank Adaptation (LoRA) has emerged as a leading technique for efficiently fine-tuning text-to-image diffusion models, and its widespread adoption on open-source platforms has fostered a vibrant culture of model sharing and customization. However, the same modular and plug-and-play flexibility that makes LoRA appealing also introduces a broader attack surface. To highlight this risk, we propose Masquerade-LoRA (MasqLoRA), the first systematic attack framework that leverages an independent LoRA module as the attack vehicle to stealthily inject malicious behavior into text-to-image diffusion models. MasqLoRA operates by freezing the base model parameters and updating only the low-rank adapter weights using a small number of "trigger word-target image" pairs. This enables the attacker to train a standalone backdoor LoRA module that embeds a hidden cross-modal mapping: when the module is loaded and a specific textual trigger is provided, the model produces a predefined visual output; otherwise, it behaves indistinguishably from the benign model, ensuring the stealthiness of the attack. Experimental results demonstrate that MasqLoRA can be trained with minimal resource overhead and achieves a high attack success rate of 99.8%. MasqLoRA reveals a severe and unique threat in the AI supply chain, underscoring the urgent need for dedicated defense mechanisms for the LoRA-centric sharing ecosystem.

Multimodal Models Open-Source Models & Weights Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

When LoRA Betrays: Backdooring Text-to-Image Models by Masquerading as Benign Adapters

Related Papers