Tencent AIMar 16, 2026arXiv:2603.15206

Efficient Document Parsing via Parallel Token Prediction

Lei Li, Ze Zhao, Meng Li, Zhongwang Lun, Yi Yuan, Xingjing Lu, Zheng Wei, Jiang Bian, Zang Li

AI Summary

This paper introduces Parallel-Token Prediction (PTP), a novel plug-in method for vision-language models (VLMs) that enables parallel decoding of multiple tokens for document parsing. PTP inserts learnable tokens into the input sequence and trains the model with specific objectives to achieve parallel generation. Experiments on OmniDocBench and olmOCR-bench show that PTP improves decoding speed by 1.6x-2.2x, reduces hallucinations, and generalizes well.

Key Contribution

Document parsing just got a whole lot faster: a simple plug-in method boosts VLM decoding speed by up to 2.2x while also reducing hallucinations.

Abstract

Document parsing, as a fundamental yet crucial vision task, is being revolutionized by vision-language models (VLMs). However, the autoregressive (AR) decoding inherent to VLMs creates a significant bottleneck, severely limiting parsing speed. In this paper, we propose Parallel-Token Prediction (PTP), a plugable, model-agnostic and simple-yet-effective method that enables VLMs to generate multiple future tokens in parallel with improved sample efficiency. Specifically, we insert some learnable tokens into the input sequence and design corresponding training objectives to equip the model with parallel decoding capabilities for document parsing. Furthermore, to support effective training, we develop a comprehensive data generation pipeline that efficiently produces large-scale, high-quality document parsing training data for VLMs. Extensive experiments on OmniDocBench and olmOCR-bench demonstrate that our method not only significantly improves decoding speed (1.6x-2.2x) but also reduces model hallucinations and exhibits strong generalization abilities.

Computer Vision Inference & Quantization Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Efficient Document Parsing via Parallel Token Prediction

Related Papers