RutgersApr 20, 2026arXiv:2604.18880

Where Fake Citations Are Made: Tracing Field-Level Hallucination to Specific Neurons in LLMs

Yuefei Chen, Yihao Quan, Yihao Quan, Xiaodong Lin, Ruixiang Tang, Ruixiang Tang

AI Summary

This paper investigates citation hallucination in LLMs, finding that author names are the most frequent source of error. They show that signals for hallucinating different citation fields don't generalize, and identify specific "FH-neurons" in Qwen2.5-32B-Instruct associated with field-specific hallucination using neuron-level CETT values and elastic-net regularization. Causal interventions targeting these neurons demonstrate that suppressing them reduces hallucination across citation fields.

Key Contribution

LLMs have "hallucination neurons" for specific citation fields, and silencing them reduces fabrication.

Abstract

LLMs frequently generate fictitious yet convincing citations, often expressing high confidence even when the underlying reference is wrong. We study this failure across 9 models and 108{,}000 generated references, and find that author names fail far more often than other fields across all models and settings. Citation style has no measurable effect, while reasoning-oriented distillation degrades recall. Probes trained on one field transfer at near-chance levels to the others, suggesting that hallucination signals do not generalize across fields. Building on this finding, we apply elastic-net regularization with stability selection to neuron-level CETT values of Qwen2.5-32B-Instruct and identify a sparse set of field-specific hallucination neurons (FH-neurons). Causal intervention further confirms their role: amplifying these neurons increases hallucination, while suppressing them improves performance across fields, with larger gains in some fields. These results suggest a lightweight approach to detecting and mitigating citation hallucination using internal model signals alone.

Eval Frameworks & Benchmarks Interpretability & Mechanistic Interp Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References31

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Where Fake Citations Are Made: Tracing Field-Level Hallucination to Specific Neurons in LLMs

Related Papers