Search papers, labs, and topics across Lattice.
Nanyang Technological University
1
0
2
RL can unlock better compositional generalization than supervised fine-tuning by directly optimizing for correct outcomes, especially on complex tasks where supervised models overfit.