INRIAParis-SaclayJul 5, 2018arXiv:2604.14860

Best of both worlds: Stochastic & adversarial best-arm identification

Yasin Abbasi-Yadkori, P. Bartlett, Victor Gabillon, Alan Malek, Michal Valko

AI Summary

This paper investigates best-arm identification in bandit problems under both stochastic and adversarial reward settings. The authors prove the impossibility of a single learner achieving optimal performance in both settings without knowledge of the reward nature, and derive a lower bound on the achievable error rate in stochastic problems under adversarial robustness constraints. They then propose a parameter-free algorithm that achieves near-optimal error rates in stochastic settings while maintaining robustness against adversarial rewards.

Key Contribution

You can have your cake and eat it too: this new algorithm nearly matches the optimal performance for stochastic best-arm identification while remaining robust to adversarial attacks, despite the theoretical impossibility of a universally optimal learner.

Abstract

We study bandit best-arm identification with arbitrary and potentially adversarial rewards. A simple random uniform learner obtains the optimal rate of error in the adversarial scenario. However, this type of strategy is suboptimal when the rewards are sampled stochastically. Therefore, we ask: Can we design a learner that performs optimally in both the stochastic and adversarial problems while not being aware of the nature of the rewards? First, we show that designing such a learner is impossible in general. In particular, to be robust to adversarial rewards, we can only guarantee optimal rates of error on a subset of the stochastic problems. We give a lower bound that characterizes the optimal rate in stochastic problems if the strategy is constrained to be robust to adversarial rewards. Finally, we design a simple parameter-free algorithm and show that its probability of error matches (up to log factors) the lower bound in stochastic problems, and it is also robust to adversarial ones.

Red-Teaming & Adversarial Robustness RLHF & Preference Learning

Citation Metrics

Citations54

Influential citations8

References28

Year2018

VenueAnnual Conference Computational Learning Theory

Related Papers

Finding related papers...

Search

Best of both worlds: Stochastic & adversarial best-arm identification

Related Papers