Search papers, labs, and topics across Lattice.
This paper introduces LegalWorld, an interactive environment designed to model the life-cycle of Chinese civil litigation through a causally connected state chain comprising five stages and seven sub-scenarios, based on a dataset of over 75,000 paired judgments. By maintaining consistency across the entire litigation process through reusable infrastructure, the authors create LongJud-Bench, a comprehensive evaluation framework for legal agents that assesses their performance across all stages. The findings reveal significant differences in agent capabilities that are obscured by aggregate scores, indicating that no single model excels uniformly across consultation, drafting, and courtroom advocacy tasks.
Legal agents exhibit starkly different performance across litigation stages, challenging the notion of a one-size-fits-all model in legal AI applications.
Civil litigation is inherently a life-cycle process: what a lawyer drafts on day one constrains what unfolds at trial months later. Yet existing legal benchmarks evaluate isolated subtasks, and prior legal-agent simulators reinitialize each scenario from shared ground truth, leaving cross-stage causal dependencies unmodeled. We present LegalWorld, a life-cycle interactive environment that models Chinese civil litigation as a causally connected state chain of five stages (seven sub-scenarios), grounded in 75,309 paired Chinese civil judgments. We pair it with reusable infrastructure (local memory, global case memory, a Skill/Tool library) that keeps each dispute consistent across its full life cycle. Building on this environment, we construct LongJud-Bench to evaluate agent capability across all five connected stages. 18,992 ratings from 217 legal-background evaluators confirm that LegalWorld trajectories are procedurally faithful and role-consistent; and a capability-level cross-model evaluation reveals sharp divergences that aggregate scores cannot expose, with no single backbone leading across consultation, drafting, and courtroom advocacy. Detailed resources will be released publicly.