Search papers, labs, and topics across Lattice.
1
0
3
0
Supervised fine-tuning can be dramatically improved by explicitly encouraging exploration of low-confidence data and suppressing high-confidence errors, leading to sustained gains in mathematical reasoning even after extensive RLVR training.