Mar 2, 2026arXiv:2603.02153

Scaling Retrieval Augmented Generation with RAG Fusion: Lessons from an Industry Deployment

Luigi Medrano, Luigi Medrano, Arush Verma, Arush Verma, Mukul Chhabra, Mukul Chhabra

AI Summary

This paper evaluates the effectiveness of retrieval fusion techniques, such as multi-query retrieval and reciprocal rank fusion (RRF), in a production RAG pipeline operating over an enterprise knowledge base. The study finds that while retrieval fusion increases raw recall, these gains are neutralized after re-ranking and truncation due to fixed retrieval depth and latency constraints, leading to a decrease in KB-level Top-k accuracy compared to single-query baselines. The authors conclude that retrieval-level improvements do not reliably translate into end-to-end gains in production RAG systems, highlighting the need for evaluation frameworks that consider retrieval quality, system efficiency, and downstream impact.

Key Contribution

Retrieval fusion, a popular technique for boosting recall in RAG systems, surprisingly fails to improve end-to-end accuracy in a production setting with realistic constraints, even *decreasing* Hit@10 in some cases.

Abstract

Retrieval-Augmented Generation (RAG) systems commonly adopt retrieval fusion techniques such as multi-query retrieval and reciprocal rank fusion (RRF) to increase document recall, under the assumption that higher recall leads to better answer quality. While these methods show consistent gains in isolated retrieval benchmarks, their effectiveness under realistic production constraints remains underexplored. In this work, we evaluate retrieval fusion in a production-style RAG pipeline operating over an enterprise knowledge base, with fixed retrieval depth, re-ranking budgets, and latency constraints. Across multiple fusion configurations, we find that retrieval fusion does increase raw recall, but these gains are largely neutralized after re-ranking and truncation. In our setting, fusion variants fail to outperform single-query baselines on KB-level Top-$k$ accuracy, with Hit@10 decreasing from $0.51$ to $0.48$ in several configurations. Moreover, fusion introduces additional latency overhead due to query rewriting and larger candidate sets, without corresponding improvements in downstream effectiveness. Our analysis suggests that recall-oriented fusion techniques exhibit diminishing returns once realistic re-ranking limits and context budgets are applied. We conclude that retrieval-level improvements do not reliably translate into end-to-end gains in production RAG systems, and argue for evaluation frameworks that jointly consider retrieval quality, system efficiency, and downstream impact.

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

Citation Metrics

Citations0

Influential citations0

References6

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Scaling Retrieval Augmented Generation with RAG Fusion: Lessons from an Industry Deployment

Related Papers