Search papers, labs, and topics across Lattice.
Shenzhen University of Advanced Technology, Shenzhen, China
1
0
0
4
Pointwise reward models can finally compete with pairwise models in RLHF, thanks to a new intergroup comparison method that scales linearly with the number of candidates.