Search papers, labs, and topics across Lattice.
ADPO (Anchored Direct Preference Optimization) is introduced, a unified mathematical framework that generalizes DPO to: soft preference probabilities encoding confidence and uncertainty; arbitrary reference policy anchors that stabilize optimization through groupwise shift invariance; and groupwise (listwise) preference modeling via Plackett–Luce distributions.