BiometricsAIMar 2, 2026arXiv:2603.02150

Zero- and Few-Shot Named-Entity Recognition: Case Study and Dataset in the Crime Domain (CrimeNER)

Miguel Lopez-Duran, Miguel Lopez-Duran, Julian Fiérrez, Julian Fierrez, A. Morales, Aythami Morales, Daniel DeAlcala, Daniel DeAlcala, Gonzalo Mancera, Gonzalo Mancera, Javier Irigoyen, Javier Irigoyen, Ruben Tolosana, Rubén Tolosana, Oscar Delgado, Oscar Delgado, Francisco Jurado, Francisco Jurado, Alvaro Ortigosa, Alvaro Ortigosa

AI Summary

This paper introduces CrimeNERdb, a new dataset of 1.5k annotated documents for Named Entity Recognition (NER) in the crime domain, derived from public reports on terrorist attacks and U.S. Department of Justice press notes. The dataset defines 5 coarse and 22 fine-grained crime-related entity types, addressing the scarcity of annotated data in this domain. Through zero- and few-shot experiments using state-of-the-art NER models and large language models, the authors demonstrate the dataset's utility and provide a benchmark for future research.

Key Contribution

CrimeNERdb offers a crucial resource for improving crime-related information extraction, filling a significant gap in NER datasets.

Abstract

The extraction of critical information from crime-related documents is a crucial task for law enforcement agencies. Named-Entity Recognition (NER) can perform this task in extracting information about the crime, the criminal, or law enforcement agencies involved. However, there is a considerable lack of adequately annotated data on general real-world crime scenarios. To address this issue, we present CrimeNER, a case-study of Crime-related zero- and Few-Shot NER, and a general Crime-related Named-Entity Recognition database (CrimeNERdb) consisting of more than 1.5k annotated documents for the NER task extracted from public reports on terrorist attacks and the U.S. Department of Justice's press notes. We define 5 types of coarse crime entity and a total of 22 types of fine-grained entity. We address the quality of the case-study and the annotated data with experiments on Zero and Few-Shot settings with State-of-the-Art NER models as well as generalist and commonly used Large Language Models.

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References45

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Zero- and Few-Shot Named-Entity Recognition: Case Study and Dataset in the Crime Domain (CrimeNER)

Related Papers