Search papers, labs, and topics across Lattice.
1
0
3
LLMs learn better from AI *reward* than AI *preference*, leading to higher human-AI agreement and improved performance compared to standard online AI feedback and RLHF.