Search papers, labs, and topics across Lattice.
1
0
3
0
LLMs trained with reinforcement learning from verifiable rewards (RLVR) become overconfident in incorrect answers, but a simple fix鈥攄ecoupling reasoning and calibration objectives鈥攃an restore proper calibration without sacrificing accuracy.