Search papers, labs, and topics across Lattice.
2
0
4
Average reward RL can finally handle the messy reality of non-stationary rewards and durations in SMDPs, thanks to a clever harmonic mean trick.
Decomposing robot swarm state representations unlocks effective cooperation even with computationally-limited agents.