Ronak Pradeep

University of Waterloo 2 Microsoft 3 Zipf AI Abstract The second edition of the TREC Retrieval Augmented Generation (RAG) Track advances research on systems that integrate retrieval and generation to address complex, real-world information needs. Building on the foundation of the inaugural 2024 track, this year’s challenge introduces long, multi-sentence narrative queries to better reflect the deep search task with the growing demand for reasoning-driven responses. Participants are tasked with designing pipelines that combine retrieval and generation while ensuring transparency and factual grounding. The track leverages the MS MARCO V2.1 corpus and employs a multi-layered evaluation framework encompassing relevance assessment, response completeness, attribution verification, and agreement analysis. By emphasizing multi-faceted narratives and attribution-rich answers from over 150 submissions this year, the TREC 2025 RAG Track aims to foster innovation in creating trustworthy, context-aware systems for retrieval augmented generation. Track website: https://trec-rag.github.io 1 Introduction This paper provides an overview of the TREC 2025 Retrieval Augment Generation (RAG) Track. The second edition of the Retrieval Augmented Generation (RAG) Track builds on the foundation laid by the inaugural TREC 2024 RAG Track [4, 9, 5, 8], pushing research further into systems that integrate retrieval and generation for complex, real-world information needs. We harness the institutional knowledge and resources provided by the National Institute of Standards and Technology (NIST) via the Text Retrieval Conference (TREC) to tackle these challenges. Now in its 34th year, TREC has led the way in many aspects of evaluation in information retrieval (IR), natural language processing (NLP), and beyond, producing many innovations that the community (both researchers as well as practitioners) take for granted today. Building upon last year’s evaluation strategy at the TREC 2024 RAG Track, a key change this year is moving from short, keyword-style queries to multi-sentence, long and complex narratives that mimic deep search scenarios. This shift reflects the growing demand for RAG and agentic systems (or agents) capable of nuanced interpretation, broader evidence coverage, and reasoning-driven responses. Participants are challenged to design pipelines that not only combine retrieval and generation but also maintain transparency and factual grounding in their outputs. These changes aim to foster innovation in creating systems that can handle intricate queries and deliver well-supported, context-rich answers. To support these advancements, the track continues to use the MS MARCO V2.1 document corpus constructed last year at the TREC 2024 RAG Track, which contains segmentation and document deduplication, useful for assessing retrieval and generation tasks in our track. The corpus ensures a diverse document representation, enabling participants to build systems that can retrieve relevant evidence and generate responses grounded in verifiable and reliable sources. By introducing narratives and emphasizing attribution, the TREC 2025 RAG Track encourages approaches that go beyond surface-level matching, requiring deeper reasoning and synthesis of information across multiple documents. This year, the track employs a multi-layered evaluation workflow to ensure reliability and depth: 1. Relevance Assessment: Human assessors judge document relevance using sub-narratives, assigning relevance scores between 0 (irrelevant) and 4 (perfectly relevant) based on the number of sub-narratives being answered in the given document. 2. Response Evaluation: For relevant documents, key information units (“nuggets”) are extracted and mapped to sub-narratives to measure content coverage and answer completeness. 3. Support Evaluation: Each sentence in a generated answer is checked against its cited document to verify factual grounding. 4. Agreement Analysis: System labels are compared with human judgments using metrics such as Cohen’s κ\kappa to assess alignment beyond chance. This layered approach combines manual and automated methods for assessment, enabling a fine-grained evaluation of retrieval quality, attribution accuracy, and response completeness. By emphasizing the narrative-driven queries and attribution-rich answers, the TREC 2025 RAG Track aims to push the boundaries of retrieval-augmented generation research, fostering innovations that improve reasoning, coverage, and trustworthiness in AI-driven systems. Query: I’m interested in sports’ societal impact, particularly concerning athlete compensation, inclusion, cultural influence, and the business side. I also want to understand how evolving equipment, training, and mindset shape both athletes and the popularity of different sports. Answer: Sports exert profound societal impacts through athlete compensation, inclusion, cultural dynamics, and commercialization, while evolving equipment, training, and mindsets reshape athletic performance and sport popularity. Athlete compensation disparities persist, with female athletes and college athletes receiving inequitable pay compared to their male and professional counterparts, raising ethical questions about just compensation and the commodification of athletes [0]. College athletes, despite generating significant revenue, often lack direct salaries, though debates persist over whether scholarships suffice or if additional compensation is warranted [0]. Inclusion remains a challenge, as racial and gender disparities persist in leadership roles and access to opportunities. For example, Black women are underrepresented in sports like field hockey and golf, while Title IX has increased female participation but not fully addressed coaching diversity. Socioeconomic barriers further limit access to elite sports, as expensive equipment and coaching exclude lower-income individuals [2]. Culturally, sports reflect and reinforce societal values, serving as a microcosm of racial and gender dynamics. They also foster unity, transcending social and political divides, while commercialization ties sports to consumerism, with sponsorships prioritizing brand visibility over equitable access [3]. The business side of sports, driven by media and corporate interests, amplifies revenue but risks prioritizing profit over athlete welfare, as seen in debates over performance-enhancing drugs and gambling [6]. Evolving equipment and training methods, such as advanced analytics and technology, enhance performance but raise ethical concerns about fairness and health risks [5]. Mental and physical training paradigms, including discipline and resilience, influence athlete success and public perceptions of sports [4]. Meanwhile, shifting societal values—like prioritizing health and inclusivity—reshape popular sports, with pleasure and participation sports gaining traction as alternatives to high-stakes power sports [5]. These interwoven factors highlight sports’ dual role as both a reflection of societal progress and a catalyst for change, though persistent inequities and commercial pressures underscore ongoing challenges [1]. Table 1: Sample answer from the TREC 2025 RAG Track for narrative 14 with citations highlighted. 2 Task Setup The TREC 2025 RAG track comprised four interconnected tasks: (1) Retrieval (R), (2) Augmented Generation (AG), (3) Retrieval Augmented Generation (RAG), and (4) Relevance Judgment (RJ) (a new task this year); all built on the MS MARCO v2.1 segmented document collection [4], which addressed deduplication and provided fine-grained segmentation to support robust retrieval and grounded generation. Participants were given 105 queries (called narratives in TREC parlance); their ultimate task was to return, for the AG and RAG tasks, well-formed answers for each narrative (up to a maximum of 400 words). The Retrieval (R) task could be viewed as an intermediate product in a full RAG pipeline. Furthermore, the Relevance Judgment (RJ) task, required participants to provide a set of relevance judgments. Throughout this paper, the narrative 14: “I’m interested in sports’ societal impact, particularly concerning athlete compensation, inclusion, cultural influence, and the business side. I also want to understand how evolving equipment, training, and mindset shape both athletes and the popularity of different sports.” has been used as a running example. A system-generated answer using Qwen3

Microsoft Research

Papers on Lattice

Total citations

Topics