HKUSTMay 26, 2026arXiv:2605.27472

AssertLLM2: A Comprehensive LLM Benchmark for Assertion Generation from Design Specifications

Yuchao Wu, Wenji Fang, Wenkai Li, Ziyan Guo, Zhiyao Xie

AI Summary

AssertLLM2 is introduced as a new benchmark for evaluating LLMs in generating SystemVerilog Assertions (SVAs) from hardware design specifications, addressing limitations in existing benchmarks by using realistic designs, structured specifications, and buggy RTL variants. It supports two practical settings: bug-prevention and bug-hunting, evaluating assertion quality through syntactic validity, formal provability, coverage, and mutation-based bug detection. Experiments using AssertLLM2 establish rigorous baselines for state-of-the-art LLMs in practical hardware verification, highlighting their potential and limitations in this domain.

Key Contribution

LLMs can now be rigorously tested on their ability to automatically generate assertions that catch bugs in real-world hardware designs, thanks to a new benchmark that includes buggy RTL variants.

Abstract

Assertion-based verification (ABV) is a cornerstone of modern hardware design, yet manually translating design intent into formal SystemVerilog Assertions (SVAs) remains labor-intensive and error-prone. While Large Language Models (LLMs) show promise for automating this process, existing benchmarks remain limited by unrealistic task formulations, weak specification inputs, and oversimplified evaluation. To address these limitations, we introduce AssertLLM2, an open-source benchmark for realistic assertion generation in hardware verification. AssertLLM2 contains 83 real-world designs across 13 functional categories. For each design, the benchmark provides a structured design specification, a verified dependency-complete golden RTL, and systematically mutated buggy RTL variants. These support two practical settings: bug-prevention, where assertions are generated from specifications to guard against design errors, and bug-hunting, where assertions are generated to expose discrepancies between intended behavior and faulty implementations. To the best of our knowledge, AssertLLM2 is the first benchmark to explicitly use buggy RTL as input to evaluate bug-detection capability. AssertLLM2 further adopts a more rigorous evaluation framework spanning syntactic validity, formal provability, coverage, and mutation-based bug detection. Our benchmark enables a more realistic and extensive assessment of assertion generation and establishes rigorous baselines for state-of-the-art LLMs in practical hardware verification.

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References32

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

AssertLLM2: A Comprehensive LLM Benchmark for Assertion Generation from Design Specifications

Related Papers