Search papers, labs, and topics across Lattice.
1
0
3
0
LLMs trained with reinforcement learning become overconfident in wrong answers due to a fundamental conflict between accuracy and calibration objectives, but this can be fixed by decoupling these objectives during training.