Tsinghua AIBeihangCASShanghai AI LabSJTUFeb 26, 2026arXiv:2602.23061

MoDora: Tree-Based Semi-Structured Document Analysis System

Bangrui Xu, Bangrui Xu, Qihang Yao, Qihang Yao, Zirui Tang, Zirui Tang, Xuanhe Zhou, Xuanhe Zhou, Yeye He, Yeye He, Shi Yu, Shihan Yu, Qianqian Xu, Bin Wang, Guoliang Li, Conghui He, Fan Wu, Fan Wu

AI Summary

The paper introduces MoDora, an LLM-powered system for semi-structured document analysis that addresses challenges in fragmented OCR output, hierarchical structure representation, and cross-document information retrieval. MoDora uses a local-alignment aggregation strategy to create layout-aware components, a Component-Correlation Tree (CCTree) to model inter-component relations, and a question-type-aware retrieval strategy for location- and semantic-based information access. Experiments demonstrate that MoDora achieves significant accuracy improvements (5.97%-61.07%) over existing baselines.

Key Contribution

LLMs can now more accurately answer questions on complex documents thanks to a new system that understands layout and hierarchical relationships between document components.

Abstract

Semi-structured documents integrate diverse interleaved data elements (e.g., tables, charts, hierarchical paragraphs) arranged in various and often irregular layouts. These documents are widely observed across domains and account for a large portion of real-world data. However, existing methods struggle to support natural language question answering over these documents due to three main technical challenges: (1) The elements extracted by techniques like OCR are often fragmented and stripped of their original semantic context, making them inadequate for analysis. (2) Existing approaches lack effective representations to capture hierarchical structures within documents (e.g., associating tables with nested chapter titles) and to preserve layout-specific distinctions (e.g., differentiating sidebars from main content). (3) Answering questions often requires retrieving and aligning relevant information scattered across multiple regions or pages, such as linking a descriptive paragraph to table cells located elsewhere in the document. To address these issues, we propose MoDora, an LLM-powered system for semi-structured document analysis. First, we adopt a local-alignment aggregation strategy to convert OCR-parsed elements into layout-aware components, and conduct type-specific information extraction for components with hierarchical titles or non-text elements. Second, we design the Component-Correlation Tree (CCTree) to hierarchically organize components, explicitly modeling inter-component relations and layout distinctions through a bottom-up cascade summarization process. Finally, we propose a question-type-aware retrieval strategy that supports (1) layout-based grid partitioning for location-based retrieval and (2) LLM-guided pruning for semantic-based retrieval. Experiments show MoDora outperforms baselines by 5.97%-61.07% in accuracy. The code is at https://github.com/weAIDB/MoDora.

Computer Vision Natural Language Processing Recommendation & Information Retrieval

Citation Metrics

Citations0

Influential citations0

References26

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

MoDora: Tree-Based Semi-Structured Document Analysis System

Related Papers