Apr 23, 2026arXiv:2604.21416

CSC: Turning the Adversary's Poison against Itself

Yucheng Shi, Xin Guo, Huajie Chen, Tianqing Zhu, Bo Liu, Wanlei Zhou

AI Summary

This paper introduces Cluster Segregation Concealment (CSC), a novel defense against poisoning attacks that leverages the observation that poisoned samples form isolated clusters in the latent space early in training. CSC identifies these anomalous clusters based on class diversity and density, then relabels the poisoned samples to a virtual class and fine-tunes the classifier to replace the backdoor association with a benign one. Experiments across four datasets and twelve attacks demonstrate that CSC significantly outperforms existing defenses, achieving near-zero attack success rates with minimal accuracy loss.

Key Contribution

Poisoning attacks got you down? This defense flips the script by using the attacker's own clustering behavior against them, achieving near-perfect attack mitigation with minimal accuracy loss.

Abstract

Poisoning-based backdoor attacks pose significant threats to deep neural networks by embedding triggers in training data, causing models to misclassify triggered inputs as adversary-specified labels while maintaining performance on clean data. Existing poison restraint-based defenses often suffer from inadequate detection against specific attack variants and compromise model utility through unlearning methods that lead to accuracy degradation. This paper conducts a comprehensive analysis of backdoor attack dynamics during model training, revealing that poisoned samples form isolated clusters in latent space early on, with triggers acting as dominant features distinct from benign ones. Leveraging these insights, we propose Cluster Segregation Concealment (CSC), a novel poison suppression defense. CSC first trains a deep neural network via standard supervised learning while segregating poisoned samples through feature extraction from early epochs, DBSCAN clustering, and identification of anomalous clusters based on class diversity and density metrics. In the concealment stage, identified poisoned samples are relabeled to a virtual class, and the model's classifier is fine-tuned using cross-entropy loss to replace the backdoor association with a benign virtual linkage, preserving overall accuracy. CSC was evaluated on four benchmark datasets against twelve poisoning-based attacks, CSC outperforms nine state-of-the-art defenses by reducing average attack success rates to near zero with minimal clean accuracy loss. Contributions include robust backdoor patterns identification, an effective concealment mechanism, and superior empirical validation, advancing trustworthy artificial intelligence.

Data Curation & Synthetic Data Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References39

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

CSC: Turning the Adversary's Poison against Itself

Related Papers