May 27, 2026arXiv:2605.27978

ABot-OCR Technical Report

Kai Jiang, Ruiyan Gong, Xiaolong Cheng, Kangning Niu, Mu Xu

AI Summary

ABot-OCR is introduced, an end-to-end vision-language model that directly transcribes page images into Markdown, bypassing the need for modular pipelines. A dedicated data engine provides large-scale, structurally consistent supervision, and Decoupled Heterogeneous Document Optimization, a structure-constrained reinforcement learning method, enhances textual accuracy and markup well-formedness. ABot-OCR achieves state-of-the-art results on OmniDocBench v1.5 and v1.6, significantly closing the gap with pipeline baselines and demonstrating strong multilingual text recognition.

Key Contribution

End-to-end document transcription is now a viable alternative to brittle pipelines: ABot-OCR achieves state-of-the-art results by directly converting page images to clean Markdown.

Abstract

We introduce ABot-OCR, an end-to-end vision-language model that transcribes a page image directly into clean Markdown in a single forward pass. By doing so, our approach completely eliminates the need for brittle modular orchestration. To maximize parsing fidelity, we develop a dedicated data engine to provide large-scale, structurally consistent supervision. Furthermore, we propose Decoupled Heterogeneous Document Optimization, a structure-constrained reinforcement learning method that sharpens textual accuracy and strictly enforces markup well-formedness beyond supervised fine-tuning alone. Extensive evaluations demonstrate the superior performance of our framework. On the OmniDocBench v1.5 and v1.6 benchmarks, ABot-OCR achieves state-of-the-art scores of 92.81 and 93.30 among all end-to-end systems, substantially narrowing the performance gap relative to strong pipeline baselines. Finally, comprehensive multilingual text recognition across ten diverse languages further confirms the robust generalizability of ABot-OCR.

Computer Vision Data Curation & Synthetic Data Multimodal Models

Citation Metrics

Citations0

Influential citations0

References40

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

ABot-OCR Technical Report

Related Papers