CASMar 15, 2026arXiv:2603.14239

QiMeng-CodeV-SVA: Training Specialized LLMs for Hardware Assertion Generation via RTL-Grounded Bidirectional Data Synthesis

Yutong Wu, Chenrui Cao, Pengwei Jin, Di Huang, Rui Zhang, Xishan Zhang, Zidong Du, Qi Guo, Xing Hu

AI Summary

The paper introduces a data synthesis framework to improve the performance of large language models (LLMs) in generating SystemVerilog Assertions (SVAs) from natural language descriptions. This framework leverages open-source Register Transfer Level (RTL) code to guide LLM-based SVA generation and employs bidirectional translation to ensure semantic equivalence between natural language and SVA pairs. By training CodeV-SVA on this synthesized data, the authors achieve state-of-the-art results on NL2SVA benchmarks, surpassing GPT-4 and DeepSeek-R1 in functional accuracy.

Key Contribution

Forget finetuning on scarce, real-world data: this work shows you can bootstrap specialized LLMs for hardware verification by generating synthetic training data grounded in open-source RTL code.

Abstract

SystemVerilog Assertions (SVAs) are crucial for hardware verification. Recent studies leverage general-purpose LLMs to translate natural language properties to SVAs (NL2SVA), but they perform poorly due to limited data. We propose a data synthesis framework to tackle two challenges: the scarcity of high-quality real-world SVA corpora and the lack of reliable methods to determine NL-SVA semantic equivalence. For the former, large-scale open-source RTLs are used to guide LLMs to generate real-world SVAs; for the latter, bidirectional translation serves as a data selection method. With the synthesized data, we train CodeV-SVA, a series of SVA generation models. Notably, CodeV-SVA-14B achieves 75.8% on NL2SVA-Human and 84.0% on NL2SVA-Machine in Func.@1, matching or exceeding advanced LLMs like GPT-5 and DeepSeek-R1.

Code Generation & Program Synthesis Data Curation & Synthetic Data Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

QiMeng-CodeV-SVA: Training Specialized LLMs for Hardware Assertion Generation via RTL-Grounded Bidirectional Data Synthesis

Related Papers