Search papers, labs, and topics across Lattice.
Columbia University
1
0
2
DPO might not be the only game in town: a decision-directed approach to reward modeling can outperform it in pairwise preference optimization.