Search papers, labs, and topics across Lattice.
1
0
3
12
Stochastic negative sampling in Direct Preference Optimization (DPO) dramatically improves multimodal sequential recommendation, suggesting that carefully curated "wrong" answers are key to preference learning.