DAMOCASApr 13, 2026arXiv:2604.11600

Geoparsing: Diagram Parsing for Plane and Solid Geometry with a Unified Formal Language

Peijie Wang, Ming-Liang Zhang, Jun Cao, Chao Deng, Dekang Ran, Hongda Sun, Pi Bu, Xuan Zhang, Yingyao Wang, Fei Yin, Cheng-Lin Liu

AI Summary

This paper introduces a unified formal language for parsing both plane and solid geometry diagrams, addressing the perception bottleneck in MLLMs for geometric reasoning. They construct GDP-29K, a large-scale dataset of plane and solid geometry diagrams paired with formal language descriptions, and train models using supervised fine-tuning with reinforcement learning via verifiable rewards to ensure syntactic correctness and geometric consistency. Results demonstrate state-of-the-art parsing performance and show that the parsed formal descriptions significantly improve MLLM performance on downstream geometry reasoning tasks.

Key Contribution

Unlock geometric reasoning in MLLMs by parsing diagrams into a unified formal language that spans both 2D and 3D geometry.

Abstract

Multimodal Large Language Models (MLLMs) have achieved remarkable progress but continue to struggle with geometric reasoning, primarily due to the perception bottleneck regarding fine-grained visual elements. While formal languages have aided plane geometry understanding, solid geometry which requires spatial understanding remains largely unexplored. In this paper, we address this challenge by designing a unified formal language that integrates plane and solid geometry, comprehensively covering geometric structures and semantic relations. We construct GDP-29K, a large-scale dataset comprising 20k plane and 9k solid geometry samples collected from diverse real-world sources, each paired with its ground-truth formal description. To ensure syntactic correctness and geometric consistency, we propose a training paradigm that combines Supervised Fine-Tuning with Reinforcement Learning via Verifiable Rewards. Experiments show that our approach achieves state-of-the-art parsing performance. Furthermore, we demonstrate that our parsed formal descriptions serve as a critical cognitive scaffold, significantly boosting MLLMs' capabilities for downstream geometry reasoning tasks. Our data and code are available at Geoparsing.

Computer Vision Multimodal Models Reasoning & Chain-of-Thought

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Geoparsing: Diagram Parsing for Plane and Solid Geometry with a Unified Formal Language

Related Papers