Search papers, labs, and topics across Lattice.
AssertLLM2 is introduced as a new benchmark for evaluating LLMs in generating SystemVerilog Assertions (SVAs) from hardware design specifications, addressing limitations in existing benchmarks by using realistic designs, structured specifications, and buggy RTL variants. It supports two practical settings: bug-prevention and bug-hunting, evaluating assertion quality through syntactic validity, formal provability, coverage, and mutation-based bug detection. Experiments using AssertLLM2 establish rigorous baselines for state-of-the-art LLMs in practical hardware verification, highlighting their potential and limitations in this domain.
LLMs can now be rigorously tested on their ability to automatically generate assertions that catch bugs in real-world hardware designs, thanks to a new benchmark that includes buggy RTL variants.
Assertion-based verification (ABV) is a cornerstone of modern hardware design, yet manually translating design intent into formal SystemVerilog Assertions (SVAs) remains labor-intensive and error-prone. While Large Language Models (LLMs) show promise for automating this process, existing benchmarks remain limited by unrealistic task formulations, weak specification inputs, and oversimplified evaluation. To address these limitations, we introduce AssertLLM2, an open-source benchmark for realistic assertion generation in hardware verification. AssertLLM2 contains 83 real-world designs across 13 functional categories. For each design, the benchmark provides a structured design specification, a verified dependency-complete golden RTL, and systematically mutated buggy RTL variants. These support two practical settings: bug-prevention, where assertions are generated from specifications to guard against design errors, and bug-hunting, where assertions are generated to expose discrepancies between intended behavior and faulty implementations. To the best of our knowledge, AssertLLM2 is the first benchmark to explicitly use buggy RTL as input to evaluate bug-detection capability. AssertLLM2 further adopts a more rigorous evaluation framework spanning syntactic validity, formal provability, coverage, and mutation-based bug detection. Our benchmark enables a more realistic and extensive assessment of assertion generation and establishes rigorous baselines for state-of-the-art LLMs in practical hardware verification.