Search papers, labs, and topics across Lattice.
This paper addresses the lack of standardized evaluation in adversarial transferability for image classification by reviewing hundreds of existing transfer-based attack methods, categorizing them into six distinct groups. The authors propose a comprehensive benchmark framework to evaluate these attacks, aiming to provide a more objective and fair comparison. The work identifies common strategies that enhance transferability and prevalent issues leading to unfair comparisons, contributing to a more rigorous understanding of adversarial transferability.
A new benchmark exposes the limitations of current adversarial transferability evaluations, revealing that many reported gains may be inflated due to unfair comparisons.
Adversarial transferability refers to the capacity of adversarial examples generated on the surrogate model to deceive alternate, unexposed victim models. This property eliminates the need for direct access to the victim model during an attack, thereby raising considerable security concerns in practical applications and attracting substantial research attention recently. In this work, we discern a lack of a standardized framework and criteria for evaluating transfer-based attacks, leading to potentially biased assessments of existing approaches. To rectify this gap, we have conducted an exhaustive review of hundreds of related works, organizing various transfer-based attacks into six distinct categories. Subsequently, we propose a comprehensive framework designed to serve as a benchmark for evaluating these attacks. In addition, we delineate common strategies that enhance adversarial transferability and highlight prevalent issues that could lead to unfair comparisons. Finally, we provide a brief review of transfer-based attacks beyond image classification.