Search papers, labs, and topics across Lattice.
This paper introduces PP-OCRv6, a lightweight OCR system that significantly enhances performance on OCR tasks by leveraging a novel architecture and data-centric optimizations. By redesigning the backbone and detection components using a unified MetaFormer-style building block, PP-OCRv6 achieves superior recognition accuracy and detection metrics while operating with far fewer parameters than existing billion-scale vision-language models. The medium tier of PP-OCRv6 reaches 83.2% recognition accuracy and 86.2% detection Hmean, outperforming previous models and demonstrating a remarkable efficiency with 3.9脳 faster inference in the tiny tier.
PP-OCRv6 outperforms billion-scale VLMs on OCR tasks with a fraction of the parameters, achieving state-of-the-art accuracy and speed.
Vision-Language Models (VLMs) have achieved impressive results on general vision-language tasks, yet they suffer from hallucination, imprecise localization, and prohibitive computational cost when applied to dedicated OCR scenarios. This paper presents PP-OCRv6, a lightweight OCR system that combines architectural innovation with data-centric optimization. PP-OCRv6 redesigns the backbone, detection neck, and recognition neck around a unified MetaFormer-style building block with structural reparameterization, decoupling spatial token mixing from channel mixing and supporting both tasks through task-specific stride configurations. Three model tiers (medium, small, tiny) share the same block primitives, covering deployment scenarios from server to edge. On our in-house benchmarks, PP-OCRv6_medium achieves 83.2% recognition accuracy and 86.2% detection Hmean, outperforming PP-OCRv5_server by +5.1% and +4.6% respectively while surpassing Qwen3-VL-235B, GPT-5.5, and Gemini-3.1-Pro with orders of magnitude fewer parameters. The tiny tier achieves 3.9$\times$ faster inference than PP-OCRv5_mobile on Intel Xeon CPU while maintaining comparable accuracy.