Search papers, labs, and topics across Lattice.
1
0
KL-regularization in multi-armed bandits provably achieves near-optimal regret, scaling linearly with the number of arms, a significant improvement over classical results.