Search papers, labs, and topics across Lattice.
Department of Computer Science, University of Toronto, Coolwei AI Lab
1
0
3
2
Jointly training LLMs to reason and refine their answers unlocks significant performance gains, outperforming standard policy optimization by up to 11.5 points on AIME.