IBM ResearchFeb 26, 2026arXiv:2602.23184

MTRAG-UN: A Benchmark for Open Challenges in Multi-Turn RAG Conversations

Sara Rosenthal, Sara Rosenthal, Yannis Katsis, Yannis Katsis, Vraj Shah, Vraj Shah, Lihong He, Lihong He, Lucian Popa, Lucian Popa, Marina Danilevsky, Marina Danilevsky

AI Summary

The paper introduces MTRAG-UN, a new benchmark dataset designed to evaluate multi-turn retrieval-augmented generation (RAG) models on challenging conversational scenarios. The benchmark comprises 666 tasks with over 2,800 turns across 6 domains, focusing on UNanswerable, UNderspecified, NONstandalone questions, and UNclear responses. Experiments using the benchmark reveal that current RAG models still face difficulties in handling these complex conversational dynamics.

Key Contribution

Multi-turn RAG models still stumble on conversations with unanswerable questions and unclear responses, as shown by a new benchmark.

Abstract

We present MTRAG-UN, a benchmark for exploring open challenges in multi-turn retrieval augmented generation, a popular use of large language models. We release a benchmark of 666 tasks containing over 2,800 conversation turns across 6 domains with accompanying corpora. Our experiments show that retrieval and generation models continue to struggle on conversations with UNanswerable, UNderspecified, and NONstandalone questions and UNclear responses. Our benchmark is available at https://github.com/IBM/mt-rag-benchmark

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

Citation Metrics

Citations0

Influential citations0

References15

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

MTRAG-UN: A Benchmark for Open Challenges in Multi-Turn RAG Conversations

Related Papers