May 6, 2026arXiv:2605.04845

Agentic Repository Mining: A Multi-Task Evaluation

AI Summary

This paper investigates the use of LLM agents that explore software repositories via bash commands for artifact classification, comparing their performance to simple LLMs with pre-engineered context. They evaluate agents across four tasks and find that agents achieve competitive accuracy while dynamically retrieving context, demonstrating robustness against context window limitations and artifact size. A manual analysis of disagreements with ground truth suggests that agents with broader context access may outperform simple LLMs in resolving ambiguities and overcoming limited context in human-labeled data.

Key Contribution

LLM agents that autonomously explore code repositories can match the classification accuracy of simpler LLMs with hand-crafted context, hinting at a future where agents surpass human-labeled data in complex software understanding tasks.

Abstract

Mining software repositories often requires classifying artifacts like commits, reviews, code lines, or entire repositories into categories. Human labeling is expensive and error-prone; limited context frequently leads to misclassifications or uncertainty in labels. We investigate whether LLM agents that dynamically explore repositories through standard bash commands can match the classification quality of simple LLMs that receive pre-engineered context. Across four tasks, eight approach configurations, and 4943 classifications, agents achieve competitive accuracy despite retrieving their own context. The primary advantage is robustness: agents avoid context-window overflows and scale independently of artifact size. A manual diagnosis of 100 cases where approaches disagree with the ground truth reveals specification ambiguities and labels produced under limited context, suggesting that accuracy against such ground truth may underestimate approaches with broader context access.

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Agentic Repository Mining: A Multi-Task Evaluation

Related Papers