College of Information ScienceUniversity of Nebraska OmahaApr 23, 2026arXiv:2604.21885

A Multimodal Text- and Graph-Based Approach for Open-Domain Event Extraction from Documents

AI Summary

This paper introduces MODEE, a multimodal open-domain event extraction approach that leverages both graph-based learning and LLM text representations to improve document-level reasoning. MODEE constructs a document graph to capture contextual, structural, and semantic relationships, mitigating the limitations of LLMs in handling long documents. Experiments show MODEE outperforms existing open-domain event extraction methods and generalizes well to closed-domain settings, achieving state-of-the-art results.

Key Contribution

LLMs can extract events more effectively when combined with graph-based document representations that overcome their "lost-in-the-middle" limitations.

Abstract

Event extraction is essential for event understanding and analysis. It supports tasks such as document summarization and decision-making in emergency scenarios. However, existing event extraction approaches have limitations: (1) closed-domain algorithms are restricted to predefined event types and thus rarely generalize to unseen types and (2) open-domain event extraction algorithms, capable of handling unconstrained event types, have largely overlooked the potential of large language models (LLMs) despite their advanced abilities. Additionally, they do not explicitly model document-level contextual, structural, and semantic reasoning, which are crucial for effective event extraction but remain challenging for LLMs due to lost-in-the-middle phenomenon and attention dilution. To address these limitations, we propose multimodal open-domain event extraction, MODEE , a novel approach for open-domain event extraction that combines graph-based learning with text-based representation from LLMs to model document-level reasoning. Empirical evaluations on large datasets demonstrate that MODEE outperforms state-of-the-art open-domain event extraction approaches and can be generalized to closed-domain event extraction, where it outperforms existing algorithms.

Multimodal Models Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References41

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

A Multimodal Text- and Graph-Based Approach for Open-Domain Event Extraction from Documents

Related Papers