Search papers, labs, and topics across Lattice.
The paper investigates the impact of confidence scale design on the metacognitive sensitivity of Large Language Models (LLMs) when reporting uncertainty. Through systematic manipulation of confidence scales along dimensions like granularity, boundary placement, and range regularity, the study evaluates metacognitive sensitivity using meta-d'. The key finding is that a 0-20 scale consistently improves metacognitive efficiency compared to the standard 0-100 scale, highlighting the significant influence of scale design on the quality of verbalized uncertainty.
LLMs' uncertainty estimates are highly sensitive to the design of the confidence scale, with a 0-20 scale boosting metacognitive efficiency compared to the standard 0-100.
Verbalized confidence, in which LLMs report a numerical certainty score, is widely used to estimate uncertainty in black-box settings, yet the confidence scale itself (typically 0--100) is rarely examined. We show that this design choice is not neutral. Across six LLMs and three datasets, verbalized confidence is heavily discretized, with more than 78% of responses concentrating on just three round-number values. To investigate this phenomenon, we systematically manipulate confidence scales along three dimensions: granularity, boundary placement, and range regularity, and evaluate metacognitive sensitivity using meta-d'. We find that a 0--20 scale consistently improves metacognitive efficiency over the standard 0--100 format, while boundary compression degrades performance and round-number preferences persist even under irregular ranges. These results demonstrate that confidence scale design directly affects the quality of verbalized uncertainty and should be treated as a first-class experimental variable in LLM evaluation.