UW-MadisonMay 5, 2026arXiv:2605.04209

Undetectable Backdoors in Model Parameters: Hiding Sparse Secrets in High Dimensions

Sarthak Choudhary, Atharv Singh Patlan, Nils Palumbo, Ashish Hooda, Kassem Fawaz, Somesh Jha

AI Summary

This paper introduces Sparse Backdoor, a supply chain attack that injects provably undetectable backdoors into pre-trained image classifiers by adding structured sparse perturbations along random directions in fully connected layers. The key to undetectability is an isotropic Gaussian dither, which creates a clean reference distribution functionally equivalent to the original classifier under a margin condition. The authors prove that detecting the backdoor is as hard as Sparse PCA detection, offering white-box undetectability guarantees against probabilistic polynomial-time distinguishers.

Key Contribution

Provably undetectable backdoors can be injected into pre-trained image classifiers, even with white-box access, by exploiting sparse perturbations and Gaussian dithering.

Abstract

We present Sparse Backdoor, a supply-chain attack that plants a \emph{provably undetectable} backdoor in pre-trained image classifiers, including convolutional networks and Vision Transformers. The attack injects a structured sparse perturbation along a randomly chosen direction into a small subset of columns at each fully connected layer, propagating a trigger signal to an adversary-chosen target class, and masks the perturbation with an independent isotropic Gaussian dither. The dither serves a single technical purpose: it induces a clean reference distribution anchored at the pre-trained weights, against which undetectability can be formalized. Under a mild margin condition on the pre-trained classifier, we show that the dithered reference is functionally equivalent to the original classifier. We prove that distinguishing the backdoor-injected model from this reference is at least as hard as Sparse PCA detection, which is computationally infeasible under standard hardness assumptions. The guarantee holds against any probabilistic polynomial-time distinguisher with white-box access to the parameters.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Undetectable Backdoors in Model Parameters: Hiding Sparse Secrets in High Dimensions

Related Papers