Search papers, labs, and topics across Lattice.
This paper benchmarks the performance of PLM-GNN hybrid architectures for code classification and vulnerability detection, systematically pairing three code-specialized PLMs (CodeBERT, GraphCodeBERT, PLBart) with three GNN architectures (GCN, GAT, GIN). Results on Java250 and Devign datasets show that PLM-GNN hybrids consistently outperform GNN-only baselines, and that performance is more sensitive to the PLM feature source than the GNN backbone, especially in identifier-obfuscation settings. The study also surprisingly finds that larger PLMs do not guarantee better performance in this hybrid setup.
Forget scaling laws: for code classification and vulnerability detection, the *right* code-specialized PLM matters more than GNN architecture or PLM size in PLM-GNN hybrids.
Code understanding models increasingly rely on pretrained language models (PLMs) and graph neural networks (GNNs), which capture complementary semantic and structural information. We conduct a controlled empirical study of PLM-GNN hybrids for code classification and vulnerability detection tasks by systematically pairing three code-specialized PLMs with three foundational GNN architectures. We compare these hybrids against PLM-only and GNN-only baselines on Java250 and Devign, including an identifier-obfuscation setting. Across both tasks, hybrids consistently outperform GNN-only baselines and often improve ranking quality over frozen PLMs. On Devign, performance and robustness are more sensitive to the PLM feature source than to the GNN backbone. We also find that larger PLMs are not necessarily better feature extractors in this pipeline, and that the PLM choice has more impact than the GNN choice. Finally, we distill these findings into practical guidelines for PLM-GNN design choices in code classification and vulnerability detection.