Search papers, labs, and topics across Lattice.
2
0
6
0
LLMs trained with reinforcement learning become overconfident in wrong answers due to a fundamental conflict between accuracy and calibration objectives, but this can be fixed by decoupling these objectives during training.
By grounding reflection in the visual artifacts of presentation slides, DeepPresenter enables agents to iteratively refine presentations in a way that internal reasoning traces alone cannot.