Tsinghua AIBUPTWaterlooJun 8, 2026arXiv:2606.09389

LexRubric: A Rubric-Guided Diagnostic Benchmark for Open-Ended Legal Tasks

Yifan Chen, Haitao Li, Yiran Hu, Kaisong Song, Jun Lin, Yueyue Wu, Qingyao Ai, Min Zhang, Yiqun Liu

AI Summary

This paper introduces LexRubric, a comprehensive rubric-guided benchmark designed to evaluate the performance of large language models (LLMs) on open-ended legal tasks, encompassing 649 instances across 14 legal scenarios. By employing a six-dimensional framework with over 12,000 expert-written scoring criteria, LexRubric allows for fine-grained diagnostic evaluations that pinpoint specific response quality failures. Evaluation results reveal that various LLMs demonstrate distinct capability profiles, highlighting the ongoing challenges in addressing open-ended legal questions effectively.

Key Contribution

LexRubric reveals that even state-of-the-art LLMs struggle with open-ended legal tasks, exposing critical gaps in their contextual understanding and reasoning abilities.

Abstract

As large language models (LLMs) are increasingly applied to real-world legal tasks, evaluating the reliability of their open-ended legal responses has become essential. These tasks require context-sensitive answers and allow little room for error, motivating fine-grained and diagnostic evaluation that can identify specific sources of response quality failures. We introduce LexRubric, a rubric-based benchmark for evaluating open-ended Chinese legal tasks. LexRubric contains 649 instances from legal consultation and judicial examination, which reflect both everyday legal needs and professional legal reasoning and cover 14 legal scenarios. It further includes 12,337 expert-written atomic scoring criteria organized under a unified six-dimensional framework, enabling accurate evaluation and diagnostic analysis across tasks and evaluation dimensions. To validate the reliability of the evaluation, we test multiple judge models and compare model-based judgments with human judgments. We further evaluate 18 recent general and legal-domain LLMs on LexRubric. Results show that different models exhibit distinct capability profiles, and that open-ended legal question remains challenging for current LLMs. Data is available at: https://github.com/foggpoy/LexRubric.

Eval Frameworks & Benchmarks

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

LexRubric: A Rubric-Guided Diagnostic Benchmark for Open-Ended Legal Tasks

Related Papers