Amazon ScienceMiroMindSFUUT AustinMay 18, 2026arXiv:2605.20244

Lean Refactor: Multi-Objective Controllable Proof Optimization via Agentic Strategy Search

Jialin Lu, Soonho Kong, Rodrigo Stehling, Kaiyu Yang, Zhangyang Wang, Weiran Sun, Wuyang Chen

AI Summary

Lean Refactor is introduced as a retrieval-augmented agentic framework that optimizes Lean proofs for multiple objectives like proof length, compilation cost, and version compatibility. It leverages a frozen LLM agent guided by retrievals from a database of multi-objective refactoring strategies annotated with metadata like supported Lean versions and expected compilation-cost reduction. Experiments demonstrate significant improvements in token-level compression (over 70% on competition benchmarks, over 20% on research repos) and compilation-time reduction (up to 60%) compared to prior methods, along with enhanced version transferability.

Key Contribution

LLMs can now automatically slim down and future-proof mathematical proofs, achieving 70% compression and 60% faster compilation by strategically rewriting them.

Abstract

We present Lean Refactor, a plug-and-play retrieval-augmented agentic framework for multi-objective, controllable, and version-robust refactoring of Lean proofs. LLM-generated proofs are notoriously correct-but-verbose and brittle across library versions, yet existing refactoring works overlook three practical challenges: 1) Lean refactoring is natively multi-objective (proof length, compilation cost, and version compatibility are often in tension); 2) Lean repositories have fragile compatibility, whereas LLM releases are unaware of Lean/Mathlib versions; 3) Training-based pipelines require repeated fine-tuning with each new LLM release, scaling neither with model churn nor with Lean's release cycle. Lean Refactor steers a frozen agentic LLM with retrievals from a curated database of multi-objective refactoring strategies, each densely annotated with metadata such as supported Lean/Mathlib versions and expected compilation-cost reduction. Experiments show over 70% token-level compression on competition benchmarks, over 20% on research repositories, and up to 60% compilation-time reduction, outperforming prior work and Claude Code. Version-filtered retrieval further improves compression on the target Lean version, and refactored miniF2F proofs exhibit stronger zero-shot version transfer to future Lean releases than their unrefactored counterparts.

Code Generation & Program Synthesis Reasoning & Chain-of-Thought Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...