BITHKUOsakaOsakaMay 27, 2026arXiv:2605.28464

The Cases LJP Never Sees: Prosecution Decision Prediction for More Complete Criminal Liability Assessment

Junyu Lu, Qi Wei, Peishuo Zheng, Hui Huang, Qianru Wang, Chuan Xiao, Jianbin Qin, Shuyuan Zheng

AI Summary

This paper introduces Prosecution Decision Prediction (PDP), a new legal AI task focused on predicting prosecutorial decisions (prosecution vs. non-prosecution) to address the limitations of existing Legal Judgment Prediction (LJP) benchmarks that only consider indicted cases. They construct PDP-Bench, a dataset of 4,630 Chinese prosecutorial decisions across 190 charges, and find that LLMs perform significantly worse on PDP compared to LJP, with standard enhancement techniques failing to bridge the performance gap. Furthermore, the study reveals that reinforcement learning with outcome rewards struggles to produce generalizable PDP discrimination.

Key Contribution

LLMs struggle to predict prosecutorial decisions, highlighting a critical blind spot in legal AI's ability to assess criminal liability beyond formally indicted cases.

Abstract

Legal Judgment Prediction (LJP) has become a core benchmark for evaluating AI in the criminal legal domain, but it only sees criminal cases that have already passed prosecutorial review and been formally indicted. As a result, LJP leaves a substantial blind spot in assessing criminal liability, overlooking cases involving insufficient evidence, no criminal liability, or guilt exempted from punishment. To fill this gap, we propose \textbf{Prosecution Decision Prediction (PDP)}, the first Legal AI task built around prosecutorial review, which classifies each case into prosecution or one of three non-prosecution decisions and reflects legal AI's capabilities in evidence evaluation, legal subsumption, and value-based discretion. We further construct \textbf{PDP-Bench}, a benchmark of 4{,}630 real Chinese prosecutorial decisions spanning 190 charges. Extensive experiments show that state-of-the-art LLMs perform substantially worse on PDP than on LJP and that mainstream enhancement routes fail to close the gap. Moreover, controlled RLVR interventions show that simple outcome rewards fail to produce generalizable PDP discrimination.

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

The Cases LJP Never Sees: Prosecution Decision Prediction for More Complete Criminal Liability Assessment

Related Papers