Feb 23, 2026arXiv:2602.20300

What Makes a Good Query? Measuring the Impact of Human-Confusing Linguistic Features on LLM Performance

William Watson, William Watson, Nicole Cho, Nicole Cho, Sumitra Ganesh, Sumitra Ganesh, Manuela Veloso, Manuela Veloso

AI Summary

This paper investigates the relationship between linguistic features of queries and the hallucination rates of large language models (LLMs). They construct a 22-dimensional query feature vector encompassing clause complexity, lexical rarity, anaphora, negation, answerability, and intention grounding, and then correlate these features with hallucination rates across a dataset of 369,837 real-world queries. The analysis reveals that features like deep clause nesting and underspecification correlate with higher hallucination propensity, while clear intention grounding and answerability correlate with lower hallucination rates.

Key Contribution

LLM hallucinations aren't just about the model – query complexity, ambiguity, and grounding are strong predictors of when models go off the rails.

Abstract

Large Language Model (LLM) hallucinations are usually treated as defects of the model or its decoding strategy. Drawing on classical linguistics, we argue that a query's form can also shape a listener's (and model's) response. We operationalize this insight by constructing a 22-dimension query feature vector covering clause complexity, lexical rarity, and anaphora, negation, answerability, and intention grounding, all known to affect human comprehension. Using 369,837 real-world queries, we ask: Are there certain types of queries that make hallucination more likely? A large-scale analysis reveals a consistent"risk landscape": certain features such as deep clause nesting and underspecification align with higher hallucination propensity. In contrast, clear intention grounding and answerability align with lower hallucination rates. Others, including domain specificity, show mixed, dataset- and model-dependent effects. Thus, these findings establish an empirically observable query-feature representation correlated with hallucination risk, paving the way for guided query rewriting and future intervention studies.

Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References122

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

What Makes a Good Query? Measuring the Impact of Human-Confusing Linguistic Features on LLM Performance

Related Papers