Search papers, labs, and topics across Lattice.
This paper analyzes 9.14 billion GitHub interactions across 62,500 repositories to quantify the increasing multilingualism of open-source software (OSS) from 2015-2025. The study tracks language use in communication, code, and documentation, revealing a steady increase in non-English participation, particularly in Korean, Chinese, and Russian. The findings highlight a tension between increased inclusivity through multilingualism and decreased visibility/participation for non-English projects, indicating language remains a barrier in OSS.
Despite Unicode and growing global participation, non-English open-source projects still get less visibility, revealing language as a persistent barrier in collaborative software development.
The open-source software (OSS) community has historically been dominated by English as the primary language for code, documentation, and developer interactions. However, with growing global participation and better support for non-Latin scripts through standards like Unicode, OSS is gradually becoming more multilingual. This study investigates the extent to which OSS is becoming more multilingual, analyzing 9.14 billion GitHub issues, pull requests, and discussions, and 62,500 repositories across five programming languages and 30 natural languages, covering the period from 2015 to 2025. We examine six research questions to track changes in language use across communication, code, and documentation. We find that multilingual participation has steadily increased, especially in Korean, Chinese, and Russian. This growth appears not only in issues and discussions but also in code comments, string literals, and documentation files. While this shift reflects greater inclusivity and language diversity in OSS, it also creates language tension. The ability to express oneself in a native language can clash with shared norms around English use, especially in collaborative settings. Non-English or multilingual projects tend to receive less visibility and participation, suggesting that language remains both a resource and a barrier, shaping who gets heard, who contributes, and how open collaboration unfolds.