Search papers, labs, and topics across Lattice.
This paper investigates the impact of context length and evidence placement on the fact verification accuracy of LLMs using retrieval-augmented generation. The authors evaluated Llama-3.1, Qwen2.5, and Qwen3 models (7B-70B parameters) on HOVER, FEVEROUS, and ClimateFEVER datasets. They found that LLMs possess non-trivial parametric factual knowledge, verification accuracy decreases with increasing context length, and evidence placement at the beginning or end of the prompt yields higher accuracy.
LLMs' fact-checking accuracy tanks as context grows, but strategically placing evidence at the start or end of prompts can mitigate this "lost in the middle" effect.
Large language models (LLMs) show strong reasoning abilities across diverse tasks, yet their performance on extended contexts remains inconsistent. While prior research has emphasized mid-context degradation in question answering, this study examines the impact of context in LLM-based fact verification. Using three datasets (HOVER, FEVEROUS, and ClimateFEVER) and five open-source models accross different parameters sizes (7B, 32B and 70B parameters) and model families (Llama-3.1, Qwen2.5 and Qwen3), we evaluate both parametric factual knowledge and the impact of evidence placement across varying context lengths. We find that LLMs exhibit non-trivial parametric knowledge of factual claims and that their verification accuracy generally declines as context length increases. Similarly to what has been shown in previous works, in-context evidence placement plays a critical role with accuracy being consistently higher when relevant evidence appears near the beginning or end of the prompt and lower when placed mid-context. These results underscore the importance of prompt structure in retrieval-augmented fact-checking systems.