Search papers, labs, and topics across Lattice.
Institute of Software, Chinese Academy of Sciences
1
0
3
13
LLMs trained with reinforcement learning become overconfident in wrong answers due to a fundamental conflict between accuracy and calibration objectives, but this can be fixed by decoupling these objectives during training.