Mar 31, 2026arXiv:2603.29088

WybeCoder: Verified Imperative Code Generation

Fabian Gloeckle, Mantas Baksys, Darius Feher, Kunhao Zheng, Amaury Hayat, Sean B. Holden, Gabriel Synnaeve, Peter O'Hearn

AI Summary

WybeCoder is introduced as an agentic code verification framework that integrates automatic verification condition generation, SMT solvers, and interactive proofs in Lean, enabling the co-evolution of code, invariants, and proofs. The framework was evaluated by translating functional verification benchmarks (Verina and Clever) to imperative code specifications. Results show that WybeCoder achieves state-of-the-art performance, solving 74% of Verina tasks and 62% of Clever tasks, demonstrating its ability to synthesize valid invariants and dispatch subgoals for complex algorithms like Heapsort.

Key Contribution

LLMs can now automatically verify imperative code at scale, achieving state-of-the-art results on challenging verification benchmarks and paving the way for large-scale verified code datasets.

Abstract

Recent progress in large language models (LLMs) has advanced automatic code generation and formal theorem proving, yet software verification has not seen the same improvement. To address this gap, we propose WybeCoder, an agentic code verification framework that enables prove-as-you-generate development where code, invariants, and proofs co-evolve. It builds on a recent framework that combines automatic verification condition generation and SMT solvers with interactive proofs in Lean. To enable systematic evaluation, we translate two benchmarks for functional verification in Lean, Verina and Clever, to equivalent imperative code specifications. On complex algorithms such as Heapsort, we observe consistent performance improvements by scaling our approach, synthesizing dozens of valid invariants and dispatching of dozens of subgoals, resulting in hundreds of lines of verified code, overcoming plateaus reported in previous works. Our best system solves 74% of Verina tasks and 62% of Clever tasks at moderate compute budgets, significantly surpassing previous evaluations and paving a path to automated construction of large-scale datasets of verified imperative code.

Code Generation & Program Synthesis Reasoning & Chain-of-Thought Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

WybeCoder: Verified Imperative Code Generation

Related Papers