Search papers, labs, and topics across Lattice.
This paper investigates why neural retrievers exhibit source bias, favoring LLM-generated text over human-written text even when semantically similar, and finds that this bias originates from artifacts in the training data rather than inherent model flaws. They demonstrate that non-semantic differences (fluency, term specificity) between positive and negative training examples correlate with differences between LLM and human text, and that these artifacts are learned during contrastive learning. The authors propose and validate two mitigation strategies: reducing artifact differences in training data and adjusting LLM text vectors to remove their projection on the bias vector.
Neural retrievers' preference for LLM-generated text isn't an inherent flaw, but rather a learned bias from artifacts present in training data, offering a path to debiasing without architectural changes.
Recent studies show that neural retrievers often display source bias, favoring passages generated by LLMs over human-written ones, even when both are semantically similar. This bias has been considered an inherent flaw of retrievers, raising concerns about the fairness and reliability of modern information access systems. Our work challenges this view by showing that source bias stems from supervision in retrieval datasets rather than the models themselves. We found that non-semantic differences, like fluency and term specificity, exist between positive and negative documents, mirroring differences between LLM and human texts. In the embedding space, the bias direction from negatives to positives aligns with the direction from human-written to LLM-generated texts. We theoretically show that retrievers inevitably absorb the artifact imbalances in the training data during contrastive learning, which leads to their preferences over LLM texts. To mitigate the effect, we propose two approaches: 1) reducing artifact differences in training data and 2) adjusting LLM text vectors by removing their projection on the bias vector. Both methods substantially reduce source bias. We hope our study alleviates some concerns regarding LLM-generated texts in information access systems.