Search papers, labs, and topics across Lattice.
1
0
3
Weak-to-strong reward models can ace the test but still fail in the real world, revealing a hidden brittleness in current preference learning approaches.