Search papers, labs, and topics across Lattice.
This paper introduces patch-based adversarial noise compression (PANC), a novel decision-based attack method specifically designed for Transformer-based visual trackers. By leveraging a noise sensitivity matrix, PANC dynamically adjusts adversarial noise across different patches, achieving a remarkable 162% increase in attack effectiveness while reducing noise levels to just 10% and minimizing query counts to 45.7%. The method was validated across multiple Transformer-based trackers and benchmark datasets, highlighting significant vulnerabilities in current models and offering a new approach for enhancing adversarial example generation.
PANC achieves a staggering 162% increase in attack effectiveness against Transformer-based visual trackers while compressing adversarial noise to just 10%.
In recent years, with the widespread application of Vision Transformer (ViT) in visual trackers, their robustness has received increasing attention. However, by focusing on global interactions between image patches, ViT reduces sensitivity to local noise, posing new challenges for adversarial attacks. Meanwhile, existing decision-based adversarial attack methods often overlook the differences in noise sensitivity between different patches, further limiting the compression efficiency of adversarial noise, especially in ViT. In visual tracking, existing adversarial attack methods primarily target Siamese network-based trackers, and research on adversarial attacks against Transformer-based trackers, particularly decision-based black-box attacks, is still relatively limited. To implement effective black-box attacks on Transformer-based trackers, this paper innovatively proposes patch-based adversarial noise compression (PANC), a decision-based adversarial attack method. This method effectively compresses adversarial noise patch by patch, significantly improving compression efficiency and attack concealment. PANC also introduces a noise sensitivity matrix that dynamically adds and reduces adversarial noise, optimizing the spatial distribution of noise while decreasing the number of queries. We validated the effectiveness of the proposed PANC attack method on several Transformer-based trackers, including OSTrack, STARK, TransT, and MixformerV2, and three public large-scale benchmark datasets: GOT-10k, TrackingNet, and LaSOT. Experimental results show that compared to the existing state-of-the-art adversarial attack method, the IoU attack, PANC compresses the noise level to 10%, improving the attack effectiveness by 162% with the number of queries of only 45.7%. Furthermore, PANC can serve as an initialization or post-processing optimization strategy for other adversarial attack methods, providing a more flexible and efficient mechanism for adversarial example generation. Our work reveals the vulnerabilities of existing Transformer-based visual trackers and offers new ideas for further improving the efficiency and concealment of adversarial attacks.