Siquan Huang

K validation set Deng et al. (2009) using CLIP Radford et al. (2021), evaluated with both ResNet-50 He et al. (2016) and ViT B/32 Dosovitskiy et al. (2020) as vision encoders (denoted BadCLIP (RN) and BadCLIP (ViT)); and BadVision on COCO Caption Chen et al. (2015) with LLaVA-1.5 Liu et al. (2024) based on a CLIP ViT-L-336px encoder Radford et al. (2021). We poison the testing dataset with 5%5\% backdoor samples. Further details can be found in the Appendix. Defense Baselines. We compare our method with several SOTA defenses for vision encoders, including detection and purification methods: DeDe Hou et al. (2025) detects backdoor samples by training a decoder on an auxiliary dataset with its OOD mode (STL-10 Coates et al. (2011)), since no access to the original training data is assumed. PatchProcessing Doan et al. (2023) monitors label flips by manipulating patches to identify backdoors. ZIP Shi et al. (2023) applies linear transformations and restores noisy samples using diffusion models. SampDetox Yang et al. (2024) removes backdoors by adding noise and then denoising it with diffusion models. Purification methods use the same diffusion models as described in their respective references. Evaluation Metrics. To evaluate the performance of the backdoor sample detection methods, we adopt binary classification metrics: True Positive Rate (TPR) and False Positive Rate (FPR). The TPR (i.e., Recall) is calculated as: TPR=TPTP+FNTPR=\frac{TP}{TP+FN}. It measures the proportion of backdoor samples correctly identified as backdoors. A higher TPR indicates better effectiveness of the detection method. The FPR is given by: FPR=FPFP+TNFPR=\frac{FP}{FP+TN}. It represents the proportion of clean images incorrectly classified as backdoors. A lower FPR indicates a smaller negative impact of the detection method on the primary task. To assess the effects of defense methods, we use commonly adopted metrics: Attack Success Rate (ASR) and Clean Accuracy (CA). ASR quantifies model performance on backdoor samples, while CA evaluates performance on clean images. In detection evaluation, any sample flagged as a backdoor—whether truly a backdoor or benign—is considered a classification failure. High CA indicates minimal impact on the model, whereas low ASR reflects effective defense. We also report the CIDEr score Vedantam et al. (2015) for the CA of image captioning tasks. For ASR of captioning, following Liu and Zhang (2025), if the main concept of the attack target appears in the caption, it is regarded as a successful attack. Implementation Details. To facilitate the reproducibility of our work, we provide the following specific experimental details. For sample processing, we partition images of any size into

Papers on Lattice

Total citations

Topics

h-index

Research focus

Computer Vision (1)Multimodal Models (1)Red-Teaming & Adversarial Robustness (1)

Frequent co-authors

Yijiang Li (1)Ni Gao (1)Xin Yan (1)Xingfu Yan (1)

Papers (1)

Mar 12, 2026

Siquan Huang +5Mar 12, 2026

BackdoorIDS: Zero-shot Backdoor Detection for Pretrained Vision Encoder

Uncover hidden backdoors in your pre-trained vision encoders without retraining, simply by watching how attention shifts as you mask parts of the image.

Siquan Huang, Yijiang Li, Ni Gao +3

Computer Vision Multimodal Models Red-Teaming & Adversarial Robustness

Search

Siquan Huang

Research focus

Frequent co-authors

Papers (1)