Search papers, labs, and topics across Lattice.
This paper introduces a unified formal language for parsing both plane and solid geometry diagrams, addressing the perception bottleneck in MLLMs for geometric reasoning. They construct GDP-29K, a large-scale dataset of plane and solid geometry diagrams paired with formal language descriptions, and train models using supervised fine-tuning with reinforcement learning via verifiable rewards to ensure syntactic correctness and geometric consistency. Results demonstrate state-of-the-art parsing performance and show that the parsed formal descriptions significantly improve MLLM performance on downstream geometry reasoning tasks.
Unlock geometric reasoning in MLLMs by parsing diagrams into a unified formal language that spans both 2D and 3D geometry.
Multimodal Large Language Models (MLLMs) have achieved remarkable progress but continue to struggle with geometric reasoning, primarily due to the perception bottleneck regarding fine-grained visual elements. While formal languages have aided plane geometry understanding, solid geometry which requires spatial understanding remains largely unexplored. In this paper, we address this challenge by designing a unified formal language that integrates plane and solid geometry, comprehensively covering geometric structures and semantic relations. We construct GDP-29K, a large-scale dataset comprising 20k plane and 9k solid geometry samples collected from diverse real-world sources, each paired with its ground-truth formal description. To ensure syntactic correctness and geometric consistency, we propose a training paradigm that combines Supervised Fine-Tuning with Reinforcement Learning via Verifiable Rewards. Experiments show that our approach achieves state-of-the-art parsing performance. Furthermore, we demonstrate that our parsed formal descriptions serve as a critical cognitive scaffold, significantly boosting MLLMs' capabilities for downstream geometry reasoning tasks. Our data and code are available at Geoparsing.