College of Information ScienceUniversity of Nebraska OmahaUniversity of Nebraska–LincolnApr 23, 2026arXiv:2604.21890

EVENT5Ws: A Large Dataset for Open-Domain Event Extraction from Documents

Praval Sharma, Ashok Samal, Leen-Kiat Soh, Deepti Joshi

AI Summary

The authors introduce EVENT5Ws, a large, manually annotated, open-domain event extraction dataset designed to overcome the limitations of existing datasets in terms of event type coverage and scale. The dataset was created using a systematic annotation pipeline and statistically verified for quality. Experiments using EVENT5Ws to evaluate state-of-the-art LLMs demonstrate its utility as a benchmark and its potential for training generalizable event extraction models.

Key Contribution

Training on EVENT5Ws allows event extraction models to generalize across geographical contexts, suggesting a path towards truly universal event understanding.

Abstract

Event extraction identifies the central aspects of events from text. It supports event understanding and analysis, which is crucial for tasks such as informed decision-making in emergencies. Therefore, it is necessary to develop automated event extraction approaches. However, existing datasets for algorithm development have limitations, including limited coverage of event types in closed-domain settings and a lack of large, manually verified dataset in open-domain settings. To address these limitations, we create EVENT5Ws , a large, manually annotated, and statistically verified open-domain event extraction dataset. We design a systematic annotation pipeline to create the dataset and provide empirical insights into annotation complexity. Using EVENT5Ws, we evaluate state-of-the-art pre-trained large language models and establish a benchmark for future research. We further show that models trained on EVENT5Ws generalize effectively to datasets from different geographical contexts, which demonstrates its potential for developing generalizable algorithms. Finally, we summarize the lessons learned during the dataset development and provide recommendations to support future large-scale dataset development.

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References37

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

EVENT5Ws: A Large Dataset for Open-Domain Event Extraction from Documents

Related Papers