Mar 12, 2026arXiv:2603.11772

Legal-DC: Benchmarking Retrieval-Augmented Generation for Legal Documents

Yaocong Li, Q. Lan, Leihan Zhang, Le Zhang

AI Summary

The paper introduces Legal-DC, a new benchmark dataset for evaluating Retrieval-Augmented Generation (RAG) systems in Chinese legal contexts, comprising 480 documents and 2,475 question-answer pairs with clause-level annotations. To address the limitations of existing RAG systems in handling structured legal provisions, the authors propose LegRAG, a framework that incorporates legal adaptive indexing and a dual-path self-reflection mechanism. Experiments demonstrate that LegRAG outperforms state-of-the-art methods on Legal-DC, with improvements ranging from 1.3% to 5.6% across key evaluation metrics.

Key Contribution

Chinese legal RAG gets a boost with Legal-DC, a new benchmark and LegRAG, a framework that leverages clause-boundary segmentation and self-reflection to improve answer accuracy by up to 5.6%.

Abstract

Retrieval-Augmented Generation (RAG) has emerged as a promising technology for legal document consultation, yet its application in Chinese legal scenarios faces two key limitations: existing benchmarks lack specialized support for joint retriever-generator evaluation, and mainstream RAG systems often fail to accommodate the structured nature of legal provisions. To address these gaps, this study advances two core contributions: First, we constructed the Legal-DC benchmark dataset, comprising 480 legal documents (covering areas such as market regulation and contract management) and 2,475 refined question-answer pairs, each annotated with clause-level references, filling the gap for specialized evaluation resources in Chinese legal RAG. Second, we propose the LegRAG framework, which integrates legal adaptive indexing (clause-boundary segmentation) with a dual-path self-reflection mechanism to ensure clause integrity while enhancing answer accuracy. Third, we introduce automated evaluation methods for large language models to meet the high-reliability demands of legal retrieval scenarios. LegRAG outperforms existing state-of-the-art methods by 1.3% to 5.6% across key evaluation metrics. This research provides a specialized benchmark, practical framework, and empirical insights to advance the development of Chinese legal RAG systems. Our code and data are available at https://github.com/legal-dc/Legal-DC.

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

Citation Metrics

Citations0

Influential citations0

References34

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Legal-DC: Benchmarking Retrieval-Augmented Generation for Legal Documents

Related Papers