Search papers, labs, and topics across Lattice.
The paper introduces a data synthesis framework to improve the performance of large language models (LLMs) in generating SystemVerilog Assertions (SVAs) from natural language descriptions. This framework leverages open-source Register Transfer Level (RTL) code to guide LLM-based SVA generation and employs bidirectional translation to ensure semantic equivalence between natural language and SVA pairs. By training CodeV-SVA on this synthesized data, the authors achieve state-of-the-art results on NL2SVA benchmarks, surpassing GPT-4 and DeepSeek-R1 in functional accuracy.
Forget finetuning on scarce, real-world data: this work shows you can bootstrap specialized LLMs for hardware verification by generating synthetic training data grounded in open-source RTL code.
SystemVerilog Assertions (SVAs) are crucial for hardware verification. Recent studies leverage general-purpose LLMs to translate natural language properties to SVAs (NL2SVA), but they perform poorly due to limited data. We propose a data synthesis framework to tackle two challenges: the scarcity of high-quality real-world SVA corpora and the lack of reliable methods to determine NL-SVA semantic equivalence. For the former, large-scale open-source RTLs are used to guide LLMs to generate real-world SVAs; for the latter, bidirectional translation serves as a data selection method. With the synthesized data, we train CodeV-SVA, a series of SVA generation models. Notably, CodeV-SVA-14B achieves 75.8% on NL2SVA-Human and 84.0% on NL2SVA-Machine in Func.@1, matching or exceeding advanced LLMs like GPT-5 and DeepSeek-R1.