Search papers, labs, and topics across Lattice.
1
0
2
Stop training your reward models on easy examples: MARS boosts reward modeling performance by focusing augmentation on the ambiguous preference pairs where the model struggles most.