Apr 1, 2026arXiv:2604.00445

Towards Reliable Truth-Aligned Uncertainty Estimation in Large Language Models

Ponhvoan Srey, Quang Minh Nguyen, Xiaobao Wu, A. Luu

AI Summary

The paper identifies "proxy failure" in LLM uncertainty estimation (UE) metrics, where these metrics become unreliable in low-information regimes because they are derived from model behavior rather than ground truth. To address this, they introduce Truth AnChoring (TAC), a post-hoc calibration method that maps raw UE scores to truth-aligned scores using noisy, few-shot supervision. Experiments demonstrate that TAC improves the calibration of UE metrics, leading to more reliable uncertainty estimates.

Key Contribution

Uncertainty estimates from LLMs often fail when they're needed most – in low-information scenarios – but a simple post-hoc calibration can fix them.

Abstract

Uncertainty estimation (UE) aims to detect hallucinated outputs of large language models (LLMs) to improve their reliability. However, UE metrics often exhibit unstable performance across configurations, which significantly limits their applicability. In this work, we formalise this phenomenon as proxy failure, since most UE metrics originate from model behaviour, rather than being explicitly grounded in the factual correctness of LLM outputs. With this, we show that UE metrics become non-discriminative precisely in low-information regimes. To alleviate this, we propose Truth AnChoring (TAC), a post-hoc calibration method to remedy UE metrics, by mapping the raw scores to truth-aligned scores. Even with noisy and few-shot supervision, our TAC can support the learning of well-calibrated uncertainty estimates, and presents a practical calibration protocol. Our findings highlight the limitations of treating heuristic UE metrics as direct indicators of truth uncertainty, and position our TAC as a necessary step toward more reliable uncertainty estimation for LLMs. The code repository is available at https://github.com/ponhvoan/TruthAnchor/.

Eval Frameworks & Benchmarks Natural Language Processing Scalable Oversight & Alignment Theory

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Towards Reliable Truth-Aligned Uncertainty Estimation in Large Language Models

Related Papers