Search papers, labs, and topics across Lattice.
L. Yu and Z. Chang are with School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China. Z. Chang is also with Faculty of Information Technology, University of Jyväskylä, FIN-40014 Jyväskylä, Finland. Y-C. Liang is with Center for Intelligent Networking and Communications (CINC), University of Electronic Science and Technology of China, 611731 Chengdu, China. Accepted for publication in IEEE Communications Magazine
1
0
2
1
By explicitly modeling and calibrating a model's intrinsic uncertainty, EGPO unlocks significant gains in reasoning performance for RL-trained language models.