Search papers, labs, and topics across Lattice.
This paper introduces and addresses the problem of algorithmic replicability in constrained multi-armed bandit (MAB) problems, where the learner must maximize reward while satisfying multiple constraints. The authors design replicable algorithms for constrained MABs that achieve comparable regret and constraint violation bounds to non-replicable algorithms. As a crucial intermediate step, they develop a replicable UCB-like algorithm for unconstrained MABs, demonstrating that optimism-based algorithms can be made replicable.
Replicable algorithms can now achieve the same performance as non-replicable ones in constrained multi-armed bandit problems, opening the door to more reliable and reproducible online learning experiments.
Algorithmic \emph{replicability} has recently been introduced to address the need for reproducible experiments in machine learning. A \emph{replicable online learning} algorithm is one that takes the same sequence of decisions across different executions in the same environment, with high probability. We initiate the study of algorithmic replicability in \emph{constrained} MAB problems, where a learner interacts with an unknown stochastic environment for $T$ rounds, seeking not only to maximize reward but also to satisfy multiple constraints. Our main result is that replicability can be achieved in constrained MABs. Specifically, we design replicable algorithms whose regret and constraint violation match those of non-replicable ones in terms of $T$. As a key step toward these guarantees, we develop the first replicable UCB-like algorithm for \emph{unconstrained} MABs, showing that algorithms that employ the optimism in-the-face-of-uncertainty principle can be replicable, a result that we believe is of independent interest.