Search papers, labs, and topics across Lattice.
1
0
3
4
By making RL agents fear a large, subjectively possible negative reward, "Golden Handcuffs" aligns them to safer behavior without sacrificing capability.