MBZUAIMay 5, 2026arXiv:2605.04157

FMI_SU_Yotkova_Kastreva at SemEval-2026 Task 13: Lightweight Detection of LLM-Generated Code via Stylometric Signals

Elitsa Yotkova, Violeta Kastreva, D. Dimitrov, Ivan Koychev, Preslav Nakov

AI Summary

This paper explores lightweight feature-based methods for detecting LLM-generated code in the SemEval-2026 Task 13, focusing on binary classification (Subtask A). The approach uses ratio-based features, parsing engines, a programming-language classifier, and a code-vs-text line classifier to extract stylometric signals. A shallow decision tree combined with heuristic rules achieves competitive performance with near-instant inference, demonstrating a computationally efficient alternative to large pretrained models.

Key Contribution

Forget the heavy transformers: surprisingly effective LLM-generated code detection can be achieved with lightweight stylometric features and decision trees, offering near-instant inference.

Abstract

SemEval-2026 Task 13 investigates machine-generated code detection across multiple programming languages and application scenarios, asking participating systems to generalize to unseen languages and domains. This paper describes our participation in Subtask A (binary classification) and explores both pretrained code encoders and lightweight feature-based methods. We design ratio-based features that are less sensitive to snippet length. To support the extraction of descriptiveness-related signals, we use parsing engines and a programming-language classifier. Additionally, we train a separate code-vs-text line classifier to identify raw natural language segments embedded within samples. We combine a shallow decision tree with heuristic rules derived from data analysis to produce the final predictions. Our approach is computationally efficient, requires only CPU resources for training, and achieves near-instant inference time, offering a lightweight alternative to large pretrained models.

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References20

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

FMI_SU_Yotkova_Kastreva at SemEval-2026 Task 13: Lightweight Detection of LLM-Generated Code via Stylometric Signals

Related Papers