Search papers, labs, and topics across Lattice.
1
0
3
Reward hacking isn't just about incentives, it's about wild directional swings in your model's parameter space – and constraining those swings can keep your LM on the straight and narrow.