Search papers, labs, and topics across Lattice.
1
0
3
22
Grounding reward learning in natural language rationales makes policies 2x more robust to spurious correlations and distribution shifts.