DeepMindENSAE & Criteo AI LabINRIAParis-SaclayApr 15, 2026arXiv:2604.13738

Covariance-adapting algorithm for semi-bandits with application to sparse rewards

AI Summary

This paper studies stochastic combinatorial semi-bandits under a general family of sub-exponential distributions, encompassing both bounded and Gaussian distributions. They derive a new lower bound on the expected regret parameterized by the unknown covariance matrix of outcomes, a tighter quantity than the sub-Gaussian matrix. The authors then propose a covariance-adapting algorithm and provide a tight asymptotic regret analysis, extending their findings to sparse outcome scenarios relevant to recommender systems.

Key Contribution

Forget sub-Gaussian assumptions: this semi-bandit algorithm adapts to the true covariance structure of outcomes, leading to tighter regret bounds and better performance.

Abstract

We investigate stochastic combinatorial semi-bandits, where the entire joint distribution of outcomes impacts the complexity of the problem instance (unlike in the standard bandits). Typical distributions considered depend on specific parameter values, whose prior knowledge is required in theory but quite difficult to estimate in practice; an example is the commonly assumed sub-Gaussian family. We alleviate this issue by instead considering a new general family of sub-exponential distributions, which contains bounded and Gaussian ones. We prove a new lower bound on the expected regret on this family, that is parameterized by the unknown covariance matrix of outcomes, a tighter quantity than the sub-Gaussian matrix. We then construct an algorithm that uses covariance estimates, and provide a tight asymptotic analysis of the regret. Finally, we apply and extend our results to the family of sparse outcomes, which has applications in many recommender systems.

Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Covariance-adapting algorithm for semi-bandits with application to sparse rewards

Related Papers