Search papers, labs, and topics across Lattice.
† Equal contribution * Corresponding author. Emails: shen.tao5@zte.com.cnAll authors are with ZTE Corporation, China
1
0
3
1
By explicitly modeling the latent human evaluation process, VRM offers a more robust reward model, sidestepping the pitfalls of spurious correlations that plague traditional methods.