Apr 12, 2026arXiv:2604.10392

Intent-aligned Formal Specification Synthesis via Traceable Refinement

Zhe Ye, Aidan Z. H. Yang, Huangyuan Su, Zhenyu Liao, Samuel Tenka, Zhizhen Qin, Udaya Ghai, Dawn Song, Soonho Kong

AI Summary

VeriSpecGen, a traceable refinement framework, synthesizes formal specifications in Lean by decomposing natural language intents into atomic requirements and generating requirement-targeted tests. Traceability maps link validation failures to specific requirements, enabling localized clause-level repairs. Experiments using Claude Opus 4.5 achieve 86.6% accuracy on the VERINA SpecGen task, a 31.8 point improvement over baselines, and training on VeriSpecGen's refinement trajectories improves specification synthesis by 62-106% while also boosting general reasoning abilities.

Key Contribution

Turns out, you can bootstrap better formal specification synthesis by training on the iterative refinement trajectories of a traceable specification generator, leading to substantial gains in both specification accuracy and general reasoning.

Abstract

Large language models are increasingly used to generate code from natural language, but ensuring correctness remains challenging. Formal verification offers a principled way to obtain such guarantees by proving that a program satisfies a formal specification. However, specifications are frequently missing in real-world codebases, and writing high-quality specifications remains expensive and expertise-intensive. We present VeriSpecGen, a traceable refinement framework that synthesizes intent-aligned specifications in Lean through requirement-level attribution and localized repair. VeriSpecGen decomposes natural language into atomic requirements and generates requirement-targeted tests with explicit traceability maps to validate generated specifications. When validation fails, traceability maps attribute failures to specific requirements, enabling targeted clause-level repairs. VeriSpecGen achieve 86.6% on VERINA SpecGen task using Claude Opus 4.5, improving over baselines by up to 31.8 points across different model families and scales. Beyond inference-time gains, we generate 343K training examples from VeriSpecGen refinement trajectories and demonstrate that training on these trajectories substantially improves specification synthesis by 62-106% relative and transfers gains to general reasoning abilities.

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Intent-aligned Formal Specification Synthesis via Traceable Refinement

Related Papers