Search papers, labs, and topics across Lattice.
1
2
8
DPO, a popular RLHF alternative, can actually *hurt* performance due to statistical misspecification, but a simple fix (AuxDPO) can bring it back on track.