SJTUYaleApr 21, 2026arXiv:2604.19000

Decompose, Structure, and Repair: A Neuro-Symbolic Framework for Autoformalization via Operator Trees

Zineng Dong, Yi Bai, Yifan Bai, Yantao Li, Yuntian Liu, Tao Luo

AI Summary

The paper introduces Decompose, Structure, and Repair (DSR), a neuro-symbolic framework for autoformalization that translates natural language math problems into formal language by decomposing statements into logical components and mapping them to structured operator trees. This approach allows for precise error localization and repair via sub-tree refinement, addressing the limitations of prior methods that treat formal code as flat sequences. Evaluated on PRIME, a new benchmark of 156 theorems annotated in Lean 4, DSR achieves state-of-the-art performance compared to end-to-end LLMs.

Key Contribution

Autoformalization gets a major upgrade: DSR's neuro-symbolic approach leverages operator trees to outperform end-to-end LLMs, proving that structured representations are key to bridging human and formal mathematics.

Abstract

Statement autoformalization acts as a critical bridge between human mathematics and formal mathematics by translating natural language problems into formal language. While prior works have focused on data synthesis and diverse training paradigms to optimize end-to-end Large Language Models (LLMs), they typically treat formal code as flat sequences, neglecting the hierarchical logic inherent in mathematical statements. In this work, we introduce Decompose, Structure, and Repair (DSR), a neuro-symbolic framework that restructures autoformalization into a modular pipeline. DSR decomposes statements into logical components and maps them to structured operator trees, leveraging this topological blueprint to precisely localize and repair errors via sub-tree refinement. Furthermore, we introduce PRIME, a benchmark of 156 undergraduate and graduate-level theorems selected from canonical textbooks and expertly annotated in Lean 4. Experimental results demonstrate that DSR establishes a new state-of-the-art, consistently outperforming baselines under equivalent computational budgets. The datasets, model, and code will be released to the public soon.

Code Generation & Program Synthesis Natural Language Processing Reasoning & Chain-of-Thought

Citation Metrics

Citations0

Influential citations0

References33

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Decompose, Structure, and Repair: A Neuro-Symbolic Framework for Autoformalization via Operator Trees

Related Papers