Search papers, labs, and topics across Lattice.
This study analyzes the quality of Stack Overflow code snippets across US states, focusing on SQL, JavaScript, Python, Ruby, and Java, using static analysis to measure reliability, readability, performance, and security. Results reveal that readability violations are most common, and that major tech hubs don't necessarily have lower violation densities, while states with greater access to computing resources exhibit fewer violations. Qualitative analysis further suggests that established tech regions produce more complex violations, while less mature regions show more fundamental errors.
Stack Overflow code quality varies significantly across US states, with major tech hubs surprisingly not producing the highest quality code.
Developers frequently reuse Stack Overflow code snippets, yet the quality of these snippets remains unevenly understood, particularly across programming languages and geographic contexts. This study investigates code quality in Stack Overflow answers from contributors located in the United States, focusing on SQL, JavaScript, Python, Ruby, and Java snippets. We evaluate four quality dimensions: reliability, readability, performance, and security. Using language-specific linting and static analysis tools, we quantify violations across states and cities, compute violation densities to enable fair regional comparison, and examine relationships between code quality and state-level diversity indicators. We further conduct inductive content analysis on code snippets from California, Utah, and North Dakota to identify qualitative patterns in code quality violations. Results show that readability violations are the most prevalent across all languages, followed by reliability, performance, and security. Common issues include improper whitespace, inconsistent formatting, program-flow errors, inefficient resource use, unsanitised inputs, and insecure dynamic evaluation. Regional analysis indicates that major technology hubs produce more parsable snippets but do not necessarily exhibit higher violation densities. States with broader access to computing devices, Internet subscriptions, higher income, and more equitable wealth distribution tend to show fewer code quality violations. Qualitative findings suggest that established technology regions often produce more complex violations, while less mature technology regions display more fundamental errors. These findings highlight the socio-technical nature of code quality in community question-answering platforms and suggest that developers should exercise caution when reusing online code snippets.