Search papers, labs, and topics across Lattice.
The paper investigates how reasoning prompts improve LLMs' ability to answer simple, single-hop factual questions, even when reasoning isn't explicitly required. Through controlled experiments, they identify two mechanisms: a "computational buffer effect" where generated tokens perform latent computation, and "factual priming" where related facts act as a semantic bridge. However, they also show that hallucinated intermediate facts during reasoning can increase the likelihood of hallucinations in the final answer, and propose a method to mitigate this by prioritizing reasoning trajectories with hallucination-free statements.
Reasoning in LLMs isn't just for complex tasks: it can unlock surprisingly better recall of simple facts, but beware – hallucinated reasoning steps can backfire and increase overall hallucination.
While reasoning in LLMs plays a natural role in math, code generation, and multi-hop factual questions, its effect on simple, single-hop factual questions remains unclear. Such questions do not require step-by-step logical decomposition, making the utility of reasoning highly counterintuitive. Nevertheless, we find that enabling reasoning substantially expands the capability boundary of the model's parametric knowledge recall, unlocking correct answers that are otherwise effectively unreachable. Why does reasoning aid parametric knowledge recall when there are no complex reasoning steps to be done? To answer this, we design a series of hypothesis-driven controlled experiments, and identify two key driving mechanisms: (1) a computational buffer effect, where the model uses the generated reasoning tokens to perform latent computation independent of their semantic content; and (2) factual priming, where generating topically related facts acts as a semantic bridge that facilitates correct answer retrieval. Importantly, this latter generative self-retrieval mechanism carries inherent risks: we demonstrate that hallucinating intermediate facts during reasoning increases the likelihood of hallucinations in the final answer. Finally, we show that our insights can be harnessed to directly improve model accuracy by prioritizing reasoning trajectories that contain hallucination-free factual statements.