Search papers, labs, and topics across Lattice.
K validation set Deng et al. (2009) using CLIP Radford et al. (2021), evaluated with both ResNet-50 He et al. (2016) and ViT B/32 Dosovitskiy et al. (2020) as vision encoders (denoted BadCLIP (RN) and BadCLIP (ViT)); and BadVision on COCO Caption Chen et al. (2015) with LLaVA-1.5 Liu et al. (2024) based on a CLIP ViT-L-336px encoder Radford et al. (2021). We poison the testing dataset with 5%5\% backdoor samples. Further details can be found in the Appendix. Defense Baselines. We compare our method with several SOTA defenses for vision encoders, including detection and purification methods: DeDe Hou et al. (2025) detects backdoor samples by training a decoder on an auxiliary dataset with its OOD mode (STL-10 Coates et al. (2011)), since no access to the original training data is assumed. PatchProcessing Doan et al. (2023) monitors label flips by manipulating patches to identify backdoors. ZIP Shi et al. (2023) applies linear transformations and restores noisy samples using diffusion models. SampDetox Yang et al. (2024) removes backdoors by adding noise and then denoising it with diffusion models. Purification methods use the same diffusion models as described in their respective references. Evaluation Metrics. To evaluate the performance of the backdoor sample detection methods, we adopt binary classification metrics: True Positive Rate (TPR) and False Positive Rate (FPR). The TPR (i.e., Recall) is calculated as: T鈥婸鈥婻=T鈥婸T鈥婸+F鈥婲TPR=\frac{TP}{TP+FN}. It measures the proportion of backdoor samples correctly identified as backdoors. A higher TPR indicates better effectiveness of the detection method. The FPR is given by: F鈥婸鈥婻=F鈥婸F鈥婸+T鈥婲FPR=\frac{FP}{FP+TN}. It represents the proportion of clean images incorrectly classified as backdoors. A lower FPR indicates a smaller negative impact of the detection method on the primary task. To assess the effects of defense methods, we use commonly adopted metrics: Attack Success Rate (ASR) and Clean Accuracy (CA). ASR quantifies model performance on backdoor samples, while CA evaluates performance on clean images. In detection evaluation, any sample flagged as a backdoor鈥攚hether truly a backdoor or benign鈥攊s considered a classification failure. High CA indicates minimal impact on the model, whereas low ASR reflects effective defense. We also report the CIDEr score Vedantam et al. (2015) for the CA of image captioning tasks. For ASR of captioning, following Liu and Zhang (2025), if the main concept of the attack target appears in the caption, it is regarded as a successful attack. Implementation Details. To facilitate the reproducibility of our work, we provide the following specific experimental details. For sample processing, we partition images of any size into
1
0
3
4
Uncover hidden backdoors in your pre-trained vision encoders without retraining, simply by watching how attention shifts as you mask parts of the image.