Search papers, labs, and topics across Lattice.
University of Illinois Urbana-Champaign
1
0
3
Bounded reward noise can lead to a dramatic reduction in regret bounds, achieving \(O(\log T)\) versus the standard \(\tilde{O}(\sqrt{T})\) for sub-Gaussian conditions.