Search papers, labs, and topics across Lattice.
1
7
3
8
By recognizing that not all tokens are created equal, D2PO offers a simple temporal weighting fix that boosts DPO alignment scores by up to 9.7 points.