Search papers, labs, and topics across Lattice.
1
6
2
RLHF models can be made significantly more robust to distribution shift by incorporating distributionally robust optimization into both reward modeling and policy optimization.