Search papers, labs, and topics across Lattice.
Univ. of Southern California, Thomas Lord Department of Computer Science, University of Southern California 鈭桟orresponding author
Stanford HAI2
0
4
Grounding reward learning in natural language rationales makes policies 2x more robust to spurious correlations and distribution shifts.
Learning robotic reward functions from a million trajectories reveals that comparing entire trajectories, not just individual frames, unlocks better generalization and learning from suboptimal data.