Search papers, labs, and topics across Lattice.
2
0
5
0
MoEs, despite their scaling advantages, suffer from a surprising "spectral plasticity loss" in continual RL, but a simple Parseval penalty can recover performance.
Forget hand-crafted reward functions: MVR uses multi-view video and a frozen VLM to automatically shape RL rewards, teaching agents complex motions without getting stuck on static poses.