Stanford HAIUSCMar 4, 2026arXiv:2603.04646

HDLFORGE: A Two-Stage Multi-Agent Framework for Efficient Verilog Code Generation with Adaptive Model Escalation

Armin Abdollahi, Saeid Shokoufa, Negin Ashrafi, Mehdi Kamal, Massoud Pedram

AI Summary

HDLFORGE is a two-stage multi-agent framework for Verilog code generation that adaptively escalates from a medium-sized LLM to an ultra-large LLM based on diagnostic scores. A counterexample-guided formal agent converts bounded-model-checking traces into micro-tests, reducing bug detection time. Experiments on VerilogEval and RTLLM benchmarks show that HDLFORGE achieves higher accuracy with lower latency compared to single-stage systems, reaching 91.2% and 91.8% Pass@1 on VerilogEval Human and V2, respectively, with 50% lower median latency.

Key Contribution

Achieve 50% lower latency in Verilog code generation without sacrificing accuracy by adaptively escalating between LLMs based on diagnostic feedback and formal verification.

Abstract

We present HDLFORGE, a two-stage multi-agent framework for automated Verilog generation that optimizes the trade-off between generation speed and accuracy. The system uses a compact coder with a medium-sized LLM by default (Stage A) and escalates to a stronger coder with an ultra-large LLM (Stage B) only when needed, guided by a calibrated score from inexpensive diagnostics including compilation, lint, and smoke tests. A key innovation is a counterexample-guided formal agent that converts bounded-model-checking traces into reusable micro-tests, significantly reducing bug detection time and repair iterations. The portable escalation controller can wrap existing Verilog LLM pipelines without modifying their internals. Evaluated on VerilogEval Human, VerilogEval V2, and RTLLM benchmarks, HDLFORGE demonstrates improved accuracy-latency trade-offs compared to single-stage systems through comprehensive analysis of wall-clock time distributions, escalation thresholds, and agent ablations. On VerilogEval Human and VerilogEval V2, HDLFORGE-Qwen achieves 91.2% and 91.8% Pass@1 with roughly 50% lower median latency, dramatically improving accuracy over other medium-sized models, and 97.2% Pass@5 on RTLLM.

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

HDLFORGE: A Two-Stage Multi-Agent Framework for Efficient Verilog Code Generation with Adaptive Model Escalation

Related Papers