Search papers, labs, and topics across Lattice.
A RoBERTa-base model was fine-tuned to classify CVE descriptions into CWE categories using a large-scale dataset of 234,770 AI-refined and agreement-filtered CVE-CWE pairs. The resulting 125M parameter model achieves 87.4% top-1 accuracy and 60.7% Macro F1 on a held-out test set, significantly outperforming a TF-IDF baseline, particularly on rare weakness categories. Remarkably, the model's performance on the CTI-Bench benchmark is statistically equivalent to an 8B parameter model, showcasing strong performance with significantly fewer parameters.
A fine-tuned RoBERTa model matches the CVE-to-CWE classification accuracy of a model 64x larger, proving that smaller, specialized models can rival LLMs in niche tasks.
We present a fine-tuned RoBERTa-base classifier (125M parameters) for mapping Common Vulnerabilities and Exposures (CVE) descriptions to Common Weakness Enumeration (CWE) categories. We construct a large-scale training dataset of 234,770 CVE descriptions with AI-refined CWE labels using Claude Sonnet 4.6, and agreement-filtered evaluation sets where NVD and AI labels agree. On our held-out test set (27,780 samples, 205 CWE classes), the model achieves 87.4% top-1 accuracy and 60.7% Macro F1 -- a +15.5 percentage-point Macro F1 gain over a TF-IDF baseline that already reaches 84.9% top-1, demonstrating the model's advantage on rare weakness categories. On the external CTI-Bench benchmark (NeurIPS 2024), the model achieves 75.6% strict accuracy (95% CI: 72.8-78.2%) -- statistically indistinguishable from Cisco Foundation-Sec-8B-Reasoning (75.3%, 8B parameters) at 64x fewer parameters. We release the dataset, model, and training code.