Search papers, labs, and topics across Lattice.
This paper introduces a single-pass uncertainty quantification method for detecting LLM hallucinations based on attention divergence. The method measures the KL divergence between each attention head's distribution and a uniform distribution, training a logistic regression probe on these features to predict answer correctness. Experiments across datasets, tasks, and model families show that attention divergence is predictive of correctness and competitive with existing uncertainty estimation methods, with the signal concentrated in middle layers and on factual tokens.
Attention heads hold the key to detecting LLM hallucinations, offering a lightweight, white-box alternative to expensive sampling or external models.
We propose a lightweight and single-pass uncertainty quantification method for detecting hallucinations in Large Language Models. The method uses attention matrices to estimate uncertainty without requiring repeated sampling or external models. Specifically, we measure the Kullback-Leibler divergence between each attention head's distribution and a uniform reference distribution, and use these features in a logistic regression probe. Across multiple datasets, task types, and model families, attention divergence is highly predictive of answer correctness and performs competitively with existing uncertainty estimation methods. We find that this signal is concentrated in middle layers and on factual tokens such as named entities and numbers, suggesting that attention dynamics provides an efficient and interpretable white-box signal of model uncertainty.