Search papers, labs, and topics across Lattice.
1
5
Forget fixed margins in RLHF: modeling the *strength* of human preferences with "preference-over-preference" learning boosts both discriminative accuracy and generative quality.