Search papers, labs, and topics across Lattice.
This paper disentangles the effects of feedback source and feedback model in LLM-based pseudo-relevance feedback (PRF) for information retrieval. Through controlled experiments across 13 low-resource BEIR tasks, the authors evaluate five LLM PRF methods. They find that the feedback model is more critical than the feedback source, LLM-generated text is a cost-effective feedback source, and corpus-derived feedback is beneficial with strong first-stage retrievers.
LLM-generated text alone can be a surprisingly effective and cost-efficient source of feedback for pseudo-relevance feedback, rivaling corpus-derived feedback in low-resource information retrieval tasks.
Pseudo-relevance feedback (PRF) methods built on large language models (LLMs) can be organized along two key design dimensions: the feedback source, which is where the feedback text is derived from and the feedback model, which is how the given feedback text is used to refine the query representation. However, the independent role that each dimension plays is unclear, as both are often entangled in empirical evaluations. In this paper, we address this gap by systematically studying how the choice of feedback source and feedback model impact PRF effectiveness through controlled experimentation. Across 13 low-resource BEIR tasks with five LLM PRF methods, our results show: (1) the choice of feedback model can play a critical role in PRF effectiveness; (2) feedback derived solely from LLM-generated text provides the most cost-effective solution; and (3) feedback derived from the corpus is most beneficial when utilizing candidate documents from a strong first-stage retriever. Together, our findings provide a better understanding of which elements in the PRF design space are most important.