Search papers, labs, and topics across Lattice.
This paper introduces Patch-based Adversarial Noise Compression (PANC), a decision-based black-box adversarial attack method designed to efficiently attack Transformer-based visual trackers by exploiting patch-level noise sensitivities. PANC uses a noise sensitivity matrix to dynamically adjust adversarial noise in different patches, optimizing noise distribution and reducing query counts. Experiments on OSTrack, STARK, TransT, and MixformerV2, using GOT-10k, TrackingNet, and LaSOT datasets, demonstrate that PANC achieves a 162% improvement in attack effectiveness with only 45.7% of the queries compared to existing methods, while compressing noise levels to 10%.
Transformer-based visual trackers, thought to be robust, can be significantly disrupted by patch-targeted adversarial noise, requiring far fewer queries than previously thought.
In recent years, with the widespread application of Vision Transformer (ViT) in visual trackers, their robustness has received increasing attention. However, by focusing on global interactions between image patches, ViT reduces sensitivity to local noise, posing new challenges for adversarial attacks. Meanwhile, existing decision-based adversarial attack methods often overlook the differences in noise sensitivity between different patches, further limiting the compression efficiency of adversarial noise, especially in ViT. In visual tracking, existing adversarial attack methods primarily target Siamese network-based trackers, and research on adversarial attacks against Transformer-based trackers, particularly decision-based black-box attacks, is still relatively limited. To implement effective black-box attacks on Transformer-based trackers, this paper innovatively proposes patch-based adversarial noise compression (PANC), a decision-based adversarial attack method. This method effectively compresses adversarial noise patch by patch, significantly improving compression efficiency and attack concealment. PANC also introduces a noise sensitivity matrix that dynamically adds and reduces adversarial noise, optimizing the spatial distribution of noise while decreasing the number of queries. We validated the effectiveness of the proposed PANC attack method on several Transformer-based trackers, including OSTrack, STARK, TransT, and MixformerV2, and three public large-scale benchmark datasets: GOT-10k, TrackingNet, and LaSOT. Experimental results show that compared to the existing state-of-the-art adversarial attack method, the IoU attack, PANC compresses the noise level to 10%, improving the attack effectiveness by 162% with the number of queries of only 45.7%. Furthermore, PANC can serve as an initialization or post-processing optimization strategy for other adversarial attack methods, providing a more flexible and efficient mechanism for adversarial example generation. Our work reveals the vulnerabilities of existing Transformer-based visual trackers and offers new ideas for further improving the efficiency and concealment of adversarial attacks.