Computer Science and EngineeringLaw and JusticeUNSWJun 11, 2026arXiv:2606.13184

LAUKIN: A Multi-jurisdictional Common Law Contract Dataset

Amrita Singh, Aditya Joshi, Jiaojiao Jiang, Hye-Young Paik, May Fong Cheong

AI Summary

This paper introduces LAUKIN, a novel dataset designed for cross-jurisdictional contract review, featuring 14,727 clause pairs from Australia, the UK, and India, annotated for legal equivalence. A multi-stage retrieval and reranking pipeline was developed to create the initial mappings, with expert annotations confirming the equivalence of a subset of clauses. The evaluation of 12 models demonstrated a best macro-F1 score of 65.11%, highlighting significant divergences in drafting conventions across jurisdictions, which complicates the task of equivalence classification.

Key Contribution

Despite shared legal traditions, LAUKIN reveals that drafting conventions across jurisdictions can lead to significant challenges in determining legal equivalence.

Abstract

Multinational companies increasingly require cross-jurisdictional contract review, yet existing legal NLP datasets are largely restricted to a single jurisdiction. We introduce LAUKIN (Legal equivalence dataset of Australia, UK, and INdia), a dataset of clause pairs (AU-UK, UK-IN, IN-AU) labelled for boolean legal equivalence. We develop a novel multi-stage retrieval and reranking pipeline to construct the initial clause pair mapping, with a subset of clause pairs subsequently annotated by legal experts as Equivalent or Not Equivalent. The dataset comprises 14,727 clause pairs from 204 contracts across 8 agreement types, of which 3,000 are manually labelled: 900 train, 600 dev, and 1,500 test. We evaluate 12 models across 4 techniques, achieving a best macro-F1 of 65.11%, establishing LAUKIN as a challenging benchmark. Results reveal that, despite shared legal heritage, drafting conventions diverge significantly across jurisdictions, making cross-jurisdictional equivalence classification non-trivial. LAUKIN also includes 11,727 unlabelled training pairs to support future semi-supervised learning research in legal NLP.

Data Curation & Synthetic Data Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References19

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

LAUKIN: A Multi-jurisdictional Common Law Contract Dataset

Related Papers