Search papers, labs, and topics across Lattice.
The paper identifies and quantifies language bias in multilingual Retrieval-Augmented Generation (mRAG) systems, showing a preference for English and the query language during reranking. They introduce an "estimated oracle evidence analysis" to demonstrate a significant performance gap between current rerankers and optimal performance. To address this, they propose LAURA, a language-agnostic utility-driven reranker alignment method, which improves mRAG performance across diverse languages and generation models.
Multilingual RAG systems are systematically suppressing "answer-critical" documents in non-English languages, crippling their ability to leverage global knowledge.
Multilingual Retrieval-Augmented Generation (mRAG) leverages cross-lingual evidence to ground Large Language Models (LLMs) in global knowledge. However, we show that current mRAG systems suffer from a language bias during reranking, systematically favoring English and the query's native language. By introducing an estimated oracle evidence analysis, we quantify a substantial performance gap between existing rerankers and the achievable upper bound. Further analysis reveals a critical distributional mismatch: while optimal predictions require evidence scattered across multiple languages, current systems systematically suppress such ``answer-critical''documents, thereby limiting downstream generation performance. To bridge this gap, we propose \textit{\textbf{L}anguage-\textbf{A}gnostic \textbf{U}tility-driven \textbf{R}eranker \textbf{A}lignment (LAURA)}, which aligns multilingual evidence ranking with downstream generative utility. Experiments across diverse languages and generation models show that LAURA effectively mitigates language bias and consistently improves mRAG performance.