Search papers, labs, and topics across Lattice.
1
0
3
5
Forget hand-engineered reward functions: Reward-Zero uses language embeddings to give RL agents an intrinsic "sense of completion," dramatically improving sample efficiency and generalization.