Mar 17, 2026arXiv:2603.16169

Open-Source Reproduction and Explainability Analysis of Corrective Retrieval Augmented Generation

AI Summary

This paper presents a fully open-source reproduction of Corrective Retrieval Augmented Generation (CRAG), replacing proprietary components with the Wikipedia API and Phi-3-mini-4k-instruct. The open-source pipeline achieves comparable performance to the original CRAG system on PopQA and ARC-Challenge. Using SHAP, the authors perform an explainability analysis of CRAG's T5-based retrieval evaluator, finding that it primarily relies on named entity alignment rather than semantic similarity.

Key Contribution

CRAG's retrieval evaluator surprisingly relies on named entity alignment, not semantic similarity, to judge document quality.

Abstract

Corrective Retrieval Augmented Generation (CRAG) improves the robustness of RAG systems by evaluating retrieved document quality and triggering corrective actions. However, the original implementation relies on proprietary components including the Google Search API and closed model weights, limiting reproducibility. In this work, we present a fully open-source reproduction of CRAG, replacing proprietary web search with the Wikipedia API and the original LLaMA-2 generator with Phi-3-mini-4k-instruct. We evaluate on PopQA and ARC-Challenge, demonstrating that our open-source pipeline achieves comparable performance to the original system. Furthermore, we contribute the first explainability analysis of CRAG's T5-based retrieval evaluator using SHAP, revealing that the evaluator primarily relies on named entity alignment rather than semantic similarity. Our analysis identifies key failure modes including domain transfer limitations on science questions. All code and results are available at https://github.com/suryayalavarthi/crag-reproduction.

Natural Language Processing Open-Source Models & Weights Recommendation & Information Retrieval

Citation Metrics

Citations0

Influential citations0

References14

Year2026

VenueN/A

Related Papers

Finding related papers...