Feb 17, 2026arXiv:2602.15671

Revisiting Backdoor Threat in Federated Instruction Tuning from a Signal Aggregation Perspective

AI Summary

This paper investigates backdoor vulnerabilities in federated instruction tuning where low-concentration poisoned data is distributed across benign clients, a scenario increasingly relevant with the use of unverified third-party data. They model the backdoor implantation process from a signal aggregation perspective, introducing the Backdoor Signal-to-Noise Ratio to quantify the dynamics of the distributed backdoor signal. Experiments demonstrate that with less than 10% poisoned data, the attack success rate exceeds 85% while maintaining primary task performance, and that existing backdoor defenses are ineffective against this threat.

Key Contribution

Even a sprinkle of poisoned data (under 10%) across federated clients can backdoor instruction-tuned models with over 85% success, and current defenses are useless.

Abstract

Federated learning security research has predominantly focused on backdoor threats from a minority of malicious clients that intentionally corrupt model updates. This paper challenges this paradigm by investigating a more pervasive and insidious threat: \textit{backdoor vulnerabilities from low-concentration poisoned data distributed across the datasets of benign clients.} This scenario is increasingly common in federated instruction tuning for language models, which often rely on unverified third-party and crowd-sourced data. We analyze two forms of backdoor data through real cases: 1) \textit{natural trigger (inherent features as implicit triggers)}; 2) \textit{adversary-injected trigger}. To analyze this threat, we model the backdoor implantation process from signal aggregation, proposing the Backdoor Signal-to-Noise Ratio to quantify the dynamics of the distributed backdoor signal. Extensive experiments reveal the severity of this threat: With just less than 10\% of training data poisoned and distributed across clients, the attack success rate exceeds 85\%, while the primary task performance remains largely intact. Critically, we demonstrate that state-of-the-art backdoor defenses, designed for attacks from malicious clients, are fundamentally ineffective against this threat. Our findings highlight an urgent need for new defense mechanisms tailored to the realities of modern, decentralized data ecosystems.

Distributed Systems & Hardware Natural Language Processing Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Revisiting Backdoor Threat in Federated Instruction Tuning from a Signal Aggregation Perspective

Related Papers