Mar 11, 2026arXiv:2603.10842

PivotAttack: Rethinking the Search Trajectory in Hard-Label Text Attacks via Pivot Words

Yuzhi Liang, Shiliang Xiao, Jingsong Wei, Qiliang Lin, Xia Li

AI Summary

PivotAttack introduces a novel "inside-out" hard-label text attack framework that identifies and perturbs "Pivot Sets" of tokens to induce label flips. A Multi-Armed Bandit algorithm is used to efficiently explore combinatorial token groups and their impact on model predictions, capturing inter-word dependencies. Experiments demonstrate that PivotAttack achieves higher Attack Success Rates with significantly fewer queries compared to existing "outside-in" methods on both traditional models and LLMs.

Key Contribution

Forget brute-force search: PivotAttack uses a clever "inside-out" strategy to find the exact words that flip an LLM's classification with far fewer queries.

Abstract

Existing hard-label text attacks often rely on inefficient"outside-in"strategies that traverse vast search spaces. We propose PivotAttack, a query-efficient"inside-out"framework. It employs a Multi-Armed Bandit algorithm to identify Pivot Sets-combinatorial token groups acting as prediction anchors-and strategically perturbs them to induce label flips. This approach captures inter-word dependencies and minimizes query costs. Extensive experiments across traditional models and Large Language Models demonstrate that PivotAttack consistently outperforms state-of-the-art baselines in both Attack Success Rate and query efficiency.

Natural Language Processing Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References28

Year2026

VenueN/A

Related Papers

Finding related papers...