Search papers, labs, and topics across Lattice.
2
0
4
24
Log-barrier regularization unlocks optimal O-tilde(t^{-1/4}) last-iterate convergence in uncoupled matrix games with bandit feedback, finally closing the gap to the theoretical limit.
Even with noisy reward observations and unknown reward distributions, near-optimal online decision-making is possible using LCB thresholding, achieving competitive ratios of $1 - 1/e$ and $1/2$ in i.i.d. and non-i.i.d. settings, respectively.