Mar 2, 2026arXiv:2603.01417

ReFeed: Retrieval Feedback-Guided Dataset Construction for Style-Aware Query Rewriting

Jiyoon Myung, Jiyoon Myung, Jungki Son, Jungki Son, Kyungro Lee, Kyungro Lee, Jihyeon Park, Jihyeon Park, Joohyung Han, Joohyung Han

AI Summary

The paper introduces ReFeed, a novel framework for generating query rewriting datasets that are sensitive to the stylistic characteristics of target documents in retrieval systems. ReFeed identifies failed retrieval cases, uses LLMs to rewrite queries to match the style of relevant documents, and validates improvements through re-retrieval, creating a corpus of (original, rewritten) query pairs. Experiments demonstrate that training rewriter models on ReFeed-generated data improves retrieval performance by aligning query style with document style, enhancing the adaptability of RAG systems.

Key Contribution

LLMs can be prompted to rewrite queries in the style of relevant documents, creating datasets that dramatically improve retrieval performance by aligning with domain-specific language.

Abstract

Retrieval systems often fail when user queries differ stylistically or semantically from the language used in domain documents. Query rewriting has been proposed to bridge this gap, improving retrieval by reformulating user queries into semantically equivalent forms. However, most existing approaches overlook the stylistic characteristics of target documents-their domain-specific phrasing, tone, and structure-which are crucial for matching real-world data distributions. We introduce a retrieval feedback-driven dataset generation framework that automatically identifies failed retrieval cases, leverages large language models to rewrite queries in the style of relevant documents, and verifies improvement through re-retrieval. The resulting corpus of (original, rewritten) query pairs enables the training of rewriter models that are explicitly aware of document style and retrieval feedback. This work highlights a new direction in data-centric information retrieval, emphasizing how feedback loops and document-style alignment can enhance the reasoning and adaptability of RAG systems in real-world, domain-specific contexts.

Data Curation & Synthetic Data Natural Language Processing Recommendation & Information Retrieval

Citation Metrics

Citations0

Influential citations0

References6

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

ReFeed: Retrieval Feedback-Guided Dataset Construction for Style-Aware Query Rewriting

Related Papers