Search papers, labs, and topics across Lattice.
China Academy of Space Technology, Beijing, China
1
0
0
0
Pointwise reward models can finally compete with pairwise models in RLHF, thanks to a new intergroup comparison method that scales linearly with the number of candidates.