Search papers, labs, and topics across Lattice.
This paper introduces DeBias-Attack, a novel approach to enhance the transferability of adversarial examples in Vision-Language Pre-training (VLP) models by addressing surrogate-specific bias in adversarial optimization. By maintaining two perturbation branches鈥攐ne optimizing on the original image and the other on a weak-semantic image鈥攖he method effectively corrects the bias that arises from surrogate model dependencies. Experimental results demonstrate that DeBias-Attack significantly improves adversarial transferability across various VLP models and tasks, showcasing its robustness against both open-source and closed-source multimodal large language models.
Correcting surrogate-specific bias in adversarial optimization can dramatically enhance the transferability of attacks across Vision-Language Pre-training models.
Adversarial examples reveal vulnerabilities in Vision-Language Pre-training (VLP) models and provide insights for improving robustness. A key property is cross-model transferability, which enables transfer-based black-box attacks. However, existing attacks often rely heavily on the surrogate model, causing cross-model performance drops. One reason is that adversarial optimization may follow surrogate model responses more than input semantics, making the update direction effective on the surrogate but less transferable to unseen targets. We refer to this dependency as surrogate-specific bias. Motivated by this observation, DeBias-Attack improves transferability by correcting surrogate-specific bias in adversarial optimization directions. It maintains two perturbation branches. The main branch optimizes a perturbation on the original image and obtains the adversarial gradient used to disrupt image-text alignment. The reference branch optimizes a perturbation on a weak-semantic image constructed from the dataset mean image with small Gaussian noise resampled at each iteration. Since this weak-semantic image contains little clear visual content, its optimization reflects surrogate responses more than image semantics, and its reference gradient estimates surrogate-specific bias. DeBias-Attack removes the aligned projection of the main gradient on the reference gradient before updating the adversarial image, then performs context-aware text substitution using the updated adversarial image. DeBias-Attack is the first transfer-based VLP attack that corrects surrogate-specific bias through gradient correction. Experiments show strong performance across VLP models, downstream tasks, and open-source and closed-source multimodal large language models.