Search papers, labs, and topics across Lattice.
1
4
3
6
Tired of DPO's length bias and probability degradation? LMPO offers a more robust and efficient alternative for preference-based RLHF, outperforming existing methods on Mistral and LLaMA3.