TempleMay 6, 2026arXiv:2605.05166

The First Token Knows: Single-Decode Confidence for Hallucination Detection

AI Summary

The paper introduces $\phi_{first}$, a novel metric for hallucination detection in LLMs based on the normalized entropy of the top-K logits at the first content-bearing token of a single greedy decode. This approach is shown to match or slightly outperform semantic self-consistency methods on closed-book question answering, while avoiding the computational overhead of repeated decoding and external inference. Experiments across multiple models and benchmarks demonstrate that $\phi_{first}$ achieves a mean AUROC of 0.820, suggesting that initial token distribution contains significant uncertainty information.

Key Contribution

Hallucination detection can be nearly as effective with a single forward pass as with expensive multi-sample methods.

Abstract

Self-consistency detects hallucinations by generating multiple sampled answers to a question and measuring agreement, but this requires repeated decoding and can be sensitive to lexical variation. Semantic self-consistency improves this by clustering sampled answers by meaning using natural language inference, but it adds both sampling cost and external inference overhead. We show that first-token confidence, phi_first, computed from the normalized entropy of the top-K logits at the first content-bearing answer token of a single greedy decode, matches or modestly exceeds semantic self-consistency on closed-book short-answer factual question answering. Across three 7-8B instruction-tuned models and two benchmarks, phi_first achieves a mean AUROC of 0.820, compared with 0.793 for semantic agreement and 0.791 for standard surface-form self-consistency. A subsumption test shows that phi_first is moderately to strongly correlated with semantic agreement, and combining the two signals yields only a small AUROC improvement over phi_first alone. These results suggest that much of the uncertainty information captured by multi-sample agreement is already available in the model's initial token distribution. We argue that phi_first should be reported as a default low-cost baseline before invoking sampling-based uncertainty estimation.

Eval Frameworks & Benchmarks Natural Language Processing Reasoning & Chain-of-Thought

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

The First Token Knows: Single-Decode Confidence for Hallucination Detection

Related Papers