Search papers, labs, and topics across Lattice.
1
3
Stop wasting compute on noisy preference data: filtering your RLHF datasets by "Preference Difference" boosts reward model accuracy and alignment performance.