Search papers, labs, and topics across Lattice.
The paper introduces an automated approach for classifying source code changes by clustering change metrics to streamline code review. It leverages k-means clustering with cosine similarity on a feature vector of 11 source code metrics (LOC, complexity, file counts, etc.) to group changes. Validated on five software systems, the method achieves a classification purity of 0.75 +/- 0.05 and entropy of 0.37 +/- 0.06, demonstrating its effectiveness in automating the initial stage of code change classification.
Automating source code change classification via metric clustering slashes code review time while maintaining high classification purity.
This paper presents an automated method for classifying source code changes during the software development process based on clustering of change metrics. The method consists of two steps: clustering of metric vectors computed for each code change, followed by expert mapping of the resulting clusters to predefined change classes. The distribution of changes into clusters is performed automatically, while the mapping of clusters to classes is carried out by an expert. Automation of the distribution step substantially reduces the time required for code change review. The k-means algorithm with a cosine similarity measure between metric vectors is used for clustering. Eleven source code metrics are employed, covering lines of code, cyclomatic complexity, file counts, interface changes, and structural changes. The method was validated on five software systems, including two open-source projects (Subversion and NHibernate), and demonstrated classification purity of P_C = 0.75 +/- 0.05 and entropy of E_C = 0.37 +/- 0.06 at a significance level of 0.05.