Search papers, labs, and topics across Lattice.
1
0
2
7
By explicitly modeling and calibrating a model's intrinsic uncertainty, EGPO unlocks significant gains in reasoning performance for RL-trained language models.