Search papers, labs, and topics across Lattice.
1
0
2
Average reward RL can finally handle the messy reality of non-stationary rewards and durations in SMDPs, thanks to a clever harmonic mean trick.