Search papers, labs, and topics across Lattice.
The paper introduces TriFusion-LLM, a multimodal fusion framework for fine-grained code clone detection that integrates heuristic similarity priors, AST structural signals, and CodeBERT semantic embeddings. A key innovation is using the primary model's probability distribution to guide selective arbitration by a large language model (LLM), focusing LLM computation on high-uncertainty samples. Experiments on the BigCloneBench benchmark demonstrate that TriFusion-LLM achieves a Macro-F1 score of 0.875, significantly outperforming existing methods while maintaining practical inference costs.
LLMs can boost code clone detection accuracy by selectively arbitrating only the most uncertain cases identified by a multimodal fusion model, achieving state-of-the-art results with minimal computational overhead.
Code clone detection (CCD) supports software maintenance, refactoring, and security analysis. Although pre-trained models capture code semantics, most work reduces CCD to binary classification, overlooking the heterogeneity of clone types and the seven fine-grained categories in BigCloneBench. We present Full Model, a multimodal fusion framework that jointly integrates heuristic similarity priors from classical machine learning, structural signals from abstract syntax trees (ASTs), and deep semantic embeddings from CodeBERT into a single predictor. By fusing structural, statistical, and semantic representations, Full Model improves discrimination among fine-grained clone types while keeping inference cost practical. On the seven-class BigCloneBench benchmark, Full Model raises Macro-F1 from 0.695 to 0.875. Ablation studies show that using the primary model's probability distribution as a prior to guide selective arbitration by a large language model (LLM) substantially outperforms blind reclassification; arbitrating only ~0.2% of high-uncertainty samples yields an additional 0.3 absolute Macro-F1 gain. Overall, Full Model achieves an effective performance-cost trade-off for fine-grained CCD and offers a practical solution for large-scale industrial deployment.