Search papers, labs, and topics across Lattice.
1
0
3
11
Stochastic negative sampling in Direct Preference Optimization (DPO) dramatically boosts recommendation accuracy, suggesting that carefully curated "wrong" answers are key to preference learning.