Search papers, labs, and topics across Lattice.
1
0
Q-learning regret bounds can be achieved without optimism, but are highly sensitive to the suboptimality gap, motivating a new smoothed exploration strategy.