Search papers, labs, and topics across Lattice.
Beijing University of Posts and Telecommunications
1
3
3
1
Stop wasting compute on noisy preference data: filtering your RLHF datasets by "Preference Difference" boosts reward model accuracy and alignment performance.