Search papers, labs, and topics across Lattice.
This paper investigates abuse behaviors on GitHub, which are often overlooked despite GitHub's critical role in software supply chains. The authors curate a labeled dataset of 392 GitHub abuse instances and propose a taxonomy categorizing abuse symptoms and root causes from a software security perspective. They then develop a unified detection framework that achieves high performance (F1-score > 89%) in identifying various abuse categories across repositories and user accounts.
GitHub abuse is more widespread and varied than previously thought, demanding a unified detection approach to safeguard software supply chains.
GitHub plays a critical role in modern software supply chains, making its security an important research concern. Existing studies have primarily focused on CI/CD automation, collaboration patterns, and community management, while abuse behaviors on GitHub have received little systematic investigation. In this paper, we systematically review and summarize reported GitHub abuse behaviors and conduct an empirical analysis of publicly available abuse cases, curating a manually labeled dataset of 392 GitHub instances. Based on this investigation, we propose a comprehensive taxonomy that characterizes their diverse symptoms and root causes from a software security perspective. Building on this taxonomy, we develop a unified detection framework capable of identifying all abuse categories across repositories and user accounts. Evaluated on the constructed dataset, the proposed framework achieves high performance across all categories (e.g., F1-score exceeding 89%). Collectively, this work advances the understanding of GitHub abuse behaviors and lays the groundwork for large-scale, systematic analysis of the GitHub platform to strengthen software supply chain security.