Search papers, labs, and topics across Lattice.
1
0
3
Ditch unimodal policies: flow-based policies combined with distributional RL unlock SOTA performance on MuJoCo by capturing complex, multimodal return distributions.