CASState Key Laboratory of AI SafetyApr 7, 2026arXiv:2604.06163

Data, Not Model: Explaining Bias toward LLM Texts in Neural Retrievers

Wei Huang, Keping Bi, Yinqiong Cai, Jiafeng Guo, Xueqi Cheng

AI Summary

This paper investigates why neural retrievers exhibit source bias, favoring LLM-generated text over human-written text even when semantically similar, and finds that this bias originates from artifacts in the training data rather than inherent model flaws. They demonstrate that non-semantic differences (fluency, term specificity) between positive and negative training examples correlate with differences between LLM and human text, and that these artifacts are learned during contrastive learning. The authors propose and validate two mitigation strategies: reducing artifact differences in training data and adjusting LLM text vectors to remove their projection on the bias vector.

Key Contribution

Neural retrievers' preference for LLM-generated text isn't an inherent flaw, but rather a learned bias from artifacts present in training data, offering a path to debiasing without architectural changes.

Abstract

Recent studies show that neural retrievers often display source bias, favoring passages generated by LLMs over human-written ones, even when both are semantically similar. This bias has been considered an inherent flaw of retrievers, raising concerns about the fairness and reliability of modern information access systems. Our work challenges this view by showing that source bias stems from supervision in retrieval datasets rather than the models themselves. We found that non-semantic differences, like fluency and term specificity, exist between positive and negative documents, mirroring differences between LLM and human texts. In the embedding space, the bias direction from negatives to positives aligns with the direction from human-written to LLM-generated texts. We theoretically show that retrievers inevitably absorb the artifact imbalances in the training data during contrastive learning, which leads to their preferences over LLM texts. To mitigate the effect, we propose two approaches: 1) reducing artifact differences in training data and 2) adjusting LLM text vectors by removing their projection on the bias vector. Both methods substantially reduce source bias. We hope our study alleviates some concerns regarding LLM-generated texts in information access systems.

Data Curation & Synthetic Data Natural Language Processing Recommendation & Information Retrieval

Citation Metrics

Citations0

Influential citations0

References41

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Data, Not Model: Explaining Bias toward LLM Texts in Neural Retrievers

Related Papers