Mar 10, 2026arXiv:2603.09414

PromptDLA: A Domain-aware Prompt Document Layout Analysis Framework with Descriptive Knowledge as a Cue

Zirui Zhang, Yaping Zhang, Lu Xiang, Yang Zhao, Feifei Zhai, Yu Zhou, Chengqing Zong

AI Summary

The paper introduces PromptDLA, a domain-aware prompting framework for Document Layout Analysis (DLA) that incorporates descriptive knowledge as cues to improve generalization across diverse datasets. PromptDLA uses a domain-aware prompter to customize prompts based on dataset-specific attributes, guiding the DLA model towards relevant features and structures. Experiments demonstrate state-of-the-art performance on DocLayNet, PubLayNet, M6Doc, and D$^4$LA datasets, highlighting the effectiveness of domain-specific prompting for DLA.

Key Contribution

Domain-specific prompts can significantly boost document layout analysis, achieving state-of-the-art results by explicitly guiding models with dataset-aware cues.

Abstract

Document Layout Analysis (DLA) is crucial for document artificial intelligence and has recently received increasing attention, resulting in an influx of large-scale public DLA datasets. Existing work often combines data from various domains in recent public DLA datasets to improve the generalization of DLA. However, directly merging these datasets for training often results in suboptimal model performance, as it overlooks the different layout structures inherent to various domains. These variations include different labeling styles, document types, and languages. This paper introduces PromptDLA, a domain-aware Prompter for Document Layout Analysis that effectively leverages descriptive knowledge as cues to integrate domain priors into DLA. The innovative PromptDLA features a unique domain-aware prompter that customizes prompts based on the specific attributes of the data domain. These prompts then serve as cues that direct the DLA toward critical features and structures within the data, enhancing the model's ability to generalize across varied domains. Extensive experiments show that our proposal achieves state-of-the-art performance among DocLayNet, PubLayNet, M6Doc, and D$^4$LA. Our code is available at https://github.com/Zirui00/PromptDLA.

Computer Vision Data Curation & Synthetic Data Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

PromptDLA: A Domain-aware Prompt Document Layout Analysis Framework with Descriptive Knowledge as a Cue

Related Papers