Search papers, labs, and topics across Lattice.
Li Auto Inc, Li Auto, Kuaishou Technology
1
3
3
3
Stop wasting compute on noisy preference data: filtering your RLHF datasets by "Preference Difference" boosts reward model accuracy and alignment performance.