Search papers, labs, and topics across Lattice.
The paper introduces Merge-Bench, a dataset of 7938 real-world merge conflicts extracted from GitHub repositories, and LLMergeJ, an LLM fine-tuned using Group Relative Policy Optimization (GRPO) for automated merge conflict resolution in Java. LLMergeJ (14B) outperforms several commercial LLMs on Java merge conflicts, approaching the performance of Gemini 2.5 Pro, while analysis across 11 languages reveals that even the best models struggle, resolving less than 60% of conflicts. This highlights the potential and limitations of current LLMs in automating complex software engineering tasks.
LLMs can resolve merge conflicts nearly as well as Google's best, but still fail in over 40% of cases, revealing a surprising bottleneck in automating software development.
This paper applies machine learning to the difficult and important task of version control merging. (1) We constructed a dataset, Merge-Bench, of 7938 real-world merge conflict hunks from 1439 GitHub repositories. The ground truth is the merge resolution that developers committed to the repository. Our dataset construction methodology is scalable to arbitrary amounts of data since no manual labeling is required. (2) We trained a model, LLMergeJ, to resolve merge conflicts in Java programs. Our approach uses Group Relative Policy Optimization (GRPO), an online reinforcement learning method, to train a Large Language Model (LLM). (3) We performed two evaluations of the performance of LLMs on resolving merge conflicts. On Java programs, LLMergeJ with 14B parameters outperforms 3 commercial LLMs, trailing only Gemini 2.5 Pro. Across 11 programming languages, commercial LLM performance is largely stable from language to language. The best models correctly resolve less than 60% of merge conflicts.