Mar 12, 2026arXiv:2603.11764

A Further Efficient Algorithm with Best-of-Both-Worlds Guarantees for $m$-Set Semi-Bandit Problem

Botao Chen, Jongyeong Lee, Chansook Kim, Chansoo Kim, Junya Honda

AI Summary

This paper analyzes the Follow-the-Perturbed-Leader (FTPL) algorithm with geometric resampling (GR) in the context of $m$-set semi-bandit problems, a special case of combinatorial semi-bandits. It demonstrates that FTPL, when combined with Fréchet and Pareto distributions, achieves the optimal regret bound of $O(\sqrt{mdT})$ in adversarial settings and logarithmic regret in stochastic settings, establishing its "Best-of-Both-Worlds" optimality. Furthermore, the authors introduce conditional geometric resampling to reduce the computational complexity of FTPL from $O(d^2)$ to $O(md(\log(d/m)+1))$ without compromising regret performance.

Key Contribution

FTPL finally achieves "Best-of-Both-Worlds" optimality for $m$-set semi-bandit problems, matching FTRL's regret guarantees with significantly reduced computational complexity via conditional geometric resampling.

Abstract

This paper studies the optimality and complexity of Follow-the-Perturbed-Leader (FTPL) policy in $m$-set semi-bandit problems. FTPL has been studied extensively as a promising candidate of an efficient algorithm with favorable regret for adversarial combinatorial semi-bandits. Nevertheless, the optimality of FTPL has still been unknown unlike Follow-the-Regularized-Leader (FTRL) whose optimality has been proved for various tasks of online learning. In this paper, we extend the analysis of FTPL with geometric resampling (GR) to $m$-set semi-bandits, which is a special case of combinatorial semi-bandits, showing that FTPL with Fr\'{e}chet and Pareto distributions with certain parameters achieves the best possible regret of $O(\sqrt{mdT})$ in adversarial setting. We also show that FTPL with Fr\'{e}chet and Pareto distributions with a certain parameter achieves a logarithmic regret for stochastic setting, meaning the Best-of-Both-Worlds optimality of FTPL for $m$-set semi-bandit problems. Furthermore, we extend the conditional geometric resampling to $m$-set semi-bandits for efficient loss estimation in FTPL, reducing the computational complexity from $O(d^2)$ of the original geometric resampling to $O(md(\log(d/m)+1))$ without sacrificing the regret performance.

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

A Further Efficient Algorithm with Best-of-Both-Worlds Guarantees for $m$-Set Semi-Bandit Problem

Related Papers