Search papers, labs, and topics across Lattice.
1
0
4
Pointwise reward models can finally compete with pairwise models in RLHF, thanks to a new intergroup comparison method that scales linearly with the number of candidates.