Search papers, labs, and topics across Lattice.
1
0
3
Learned approval in MONA can eliminate reward hacking, but at the cost of significantly under-optimizing for the intended task, revealing a critical trade-off in safe RL.