JHUUMassUMDMar 9, 2026arXiv:2603.08819

Beyond Relevance: On the Relationship Between Retrieval and RAG Information Coverage

Saron Samuel, Alexander Martin, Eugene Yang, Andrew Yates, Dawn J. Lawrie, Ian Soborof, Laura Dietz, Benjamin Van Durme

AI Summary

This paper investigates the relationship between retrieval quality and the information coverage of generated responses in Retrieval-Augmented Generation (RAG) systems across text and multimodal benchmarks. The authors analyzed 15 text and 10 multimodal retrieval stacks within various RAG pipelines, using coverage-based retrieval metrics and nugget coverage in generated responses as evaluation criteria. They found strong correlations between coverage-based retrieval metrics and the information coverage of generated responses, especially when retrieval objectives align with generation goals.

Key Contribution

Stop blindly optimizing for retrieval relevance in RAG pipelines: coverage-based retrieval metrics are better early indicators of the final generated response's information coverage.

Abstract

Retrieval-augmented generation (RAG) systems combine document retrieval with a generative model to address complex information seeking tasks like report generation. While the relationship between retrieval quality and generation effectiveness seems intuitive, it has not been systematically studied. We investigate whether upstream retrieval metrics can serve as reliable early indicators of the final generated response's information coverage. Through experiments across two text RAG benchmarks (TREC NeuCLIR 2024 and TREC RAG 2024) and one multimodal benchmark (WikiVideo), we analyze 15 text retrieval stacks and 10 multimodal retrieval stacks across four RAG pipelines and multiple evaluation frameworks (Auto-ARGUE and MiRAGE). Our findings demonstrate strong correlations between coverage-based retrieval metrics and nugget coverage in generated responses at both topic and system levels. This relationship holds most strongly when retrieval objectives align with generation goals, though more complex iterative RAG pipelines can partially decouple generation quality from retrieval effectiveness. These findings provide empirical support for using retrieval metrics as proxies for RAG performance.

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

Citation Metrics

Citations0

Influential citations0

References63

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Beyond Relevance: On the Relationship Between Retrieval and RAG Information Coverage

Related Papers