Search papers, labs, and topics across Lattice.
This paper investigates the impact of query rewriting techniques on biases in dense retrievers within RAG systems, focusing on brevity, position, literal matching, and repetition biases. They evaluate five rewriting methods across six retrievers, finding that while simple LLM-based rewriting reduces aggregate bias by 54%, it falters under adversarial conditions with combined biases. Mechanistic analysis reveals that rewriting reduces bias by increasing score variance, while pseudo-document generation reduces bias by decorrelation from bias-inducing features.
LLM-based query rewriting in RAG can reduce retrieval bias by over 50%, but breaks down when biases combine adversarially, revealing the limits of query-side interventions.
Dense retrievers in retrieval-augmented generation (RAG) systems exhibit systematic biases -- including brevity, position, literal matching, and repetition biases -- that can compromise retrieval quality. Query rewriting techniques are now standard in RAG pipelines, yet their impact on these biases remains unexplored. We present the first systematic study of how query enhancement techniques affect dense retrieval biases, evaluating five methods across six retrievers. Our findings reveal that simple LLM-based rewriting achieves the strongest aggregate bias reduction (54\%), yet fails under adversarial conditions where multiple biases combine. Mechanistic analysis uncovers two distinct mechanisms: simple rewriting reduces bias through increased score variance, while pseudo-document generation methods achieve reduction through genuine decorrelation from bias-inducing features. However, no technique uniformly addresses all biases, and effects vary substantially across retrievers. Our results provide practical guidance for selecting query enhancement strategies based on specific bias vulnerabilities. More broadly, we establish a taxonomy distinguishing query-document interaction biases from document encoding biases, clarifying the limits of query-side interventions for debiasing RAG systems.