Search papers, labs, and topics across Lattice.
USC Physical Superintelligence (PSI) Lab
1
6
4
2
DPO's classification loss, often seen as distinct from RL, is actually deeply connected to RL algorithms like PPO, according to a new unified framework.