Search papers, labs, and topics across Lattice.
University of Massachusetts Amherst
1
1
1
2
Forget fixed margins in RLHF: modeling the *strength* of human preferences with "preference-over-preference" learning boosts both discriminative accuracy and generative quality.