Search papers, labs, and topics across Lattice.
This paper investigates training LLMs to explicitly express uncertainty through two interfaces: a global verbalized confidence score and a local reasoning-time uncertainty signal. They find that verbalized confidence improves calibration and enables more selective retrieval in Adaptive RAG, while reasoning-time signaling exposes previously silent failures and improves retrieval triggering. The study reveals that verbal confidence refines existing uncertainty decoding, while reasoning-time signaling induces broader model reorganization, suggesting task-specific training for optimal uncertainty expression.
Explicitly training LLMs to verbalize confidence scores and signal reasoning-time uncertainty unlocks better calibration, failure detection, and control in retrieval-augmented generation.
Large language models are increasingly used in settings where uncertainty must drive decisions such as abstention, retrieval, and verification. Most existing methods treat uncertainty as a latent quantity to estimate after generation rather than a signal the model is trained to express. We instead study uncertainty as an interface for control. We compare two complementary interfaces: a global interface, where the model verbalizes a calibrated confidence score for its final answer, and a local interface, where the model emits an explicitmarker during reasoning when it enters a high-risk state. These interfaces provide different but complementary benefits. Verbalized confidence substantially improves calibration, reduces overconfident errors, and yields the strongest overall Adaptive RAG controller while using retrieval more selectively. Reasoning-time uncertainty signaling makes previously silent failures visible during generation, improves wrong-answer coverage, and provides an effective high-recall retrieval trigger. Our findings further show that the two interfaces work differently internally: verbal confidence mainly refines how existing uncertainty is decoded, whereas reasoning-time signaling induces a broader late-layer reorganization. Together, these results suggest that effective uncertainty in LLMs should be trained as task-matched communication: global confidence for deciding whether to trust a final answer, and local signals for deciding when intervention is needed.