Search papers, labs, and topics across Lattice.
University of Chinese Academy of Sciences, Chinese Academy of Sciences
1
0
3
Jointly training MTP and RL doesn't have to hurt: a simple coefficient calibration scheme unlocks performance gains on mathematical reasoning tasks.