BeihangNankai UniversityMay 28, 2026arXiv:2605.29886

CRITIC-R1: Learning Structured Critics for Retrieval-Augmented Generation

Wen Xiao, Ziwei Zhang, Chuanyue Yu, Xingcheng Fu, Qingyu Sun, Runhua Xu, Jianxin Li

AI Summary

CRITIC-R1 is introduced as a structured critic framework that uses reinforcement learning to diagnose and correct errors in retrieval-augmented generation (RAG) systems. The framework decomposes RAG critique into diagnostic dimensions like error verdict, location, reasoning analysis, and fix generation. By using Conservative Judgement Alignment (CJA) and Diagnostic Quality Alignment (DQA) reward functions, CRITIC-R1 learns to provide calibrated and fine-grained feedback, leading to improved answer quality across five QA benchmarks.

Key Contribution

RAG systems get a boost: CRITIC-R1 learns to diagnose and fix errors with structured feedback, outperforming strong baselines on knowledge-intensive QA.

Abstract

Retrieval-augmented generation (RAG) improves knowledge-intensive question answering by incorporating external evidence. However, existing RAG methods still suffer from hallucinations and subtle reasoning errors. Recent studies introduce external critics to refine RAG outputs, yet they often provide coarse-grained and weakly structured feedback, exhibit over-aggressive intervention, and lead to noisy and unreliable refinement, limiting their effectiveness for correction. To tackle these issues, we propose CRITIC-R1, a structured critic framework that formulates and learns RAG critique as an explicit error diagnosis problem using reinforcement learning (RL). Our framework categorizes common RAG errors into multiple diagnostic dimensions, including verdict, error location, reasoning analysis, and fix generation. To learn these capabilities, we design two reward functions: Conservative Judgement Alignment (CJA) first encourages calibrated high-level judgements while mitigating the over-aggressive phenomenon, whereas Diagnostic Quality Alignment (DQA) further improves fine-grained diagnostic feedback through gated rewards. We train the critic model using GRPO-based RL with process-level supervision collected from external LLM teacher models. Experiments across five QA benchmarks show that CRITIC-R1 consistently improves answer quality over strong RAG baselines. Our source code is available at https://anonymous.4open.science/r/critic-r1-FCB0

Natural Language Processing Reasoning & Chain-of-Thought Recommendation & Information Retrieval

Citation Metrics

Citations0

Influential citations0

References27

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

CRITIC-R1: Learning Structured Critics for Retrieval-Augmented Generation

Related Papers