Search papers, labs, and topics across Lattice.
1
2
Preference learning methods like RLHF and DPO are not as different as you think: they're just different choices along three key axes.