Search papers, labs, and topics across Lattice.
1
6
4
3
DPO's classification loss, often seen as distinct from RL, is actually deeply connected to RL algorithms like PPO, according to a new unified framework.