Search papers, labs, and topics across Lattice.
This paper presents a systematic literature review investigating the sustainability of code generated by Large Language Models (LLMs), focusing on energy efficiency and resource usage. The review analyzes primary studies to understand how LLMs perform in producing sustainable code and examines the definitions, metrics, and evaluation strategies used to assess sustainability. The key finding is that research in this area is limited and lacks a standardized framework for measuring the sustainability of LLM-generated code, highlighting the need for more systematic research and clearer definitions.
LLM-generated code's dirty secret: the long-term environmental costs of inefficient code could outweigh the benefits of AI-assisted software engineering.
Large Language Models (LLMs) are widely used in software engineering to generate, complete, translate, and fix code, improving developer productivity. While most research focuses on the energy consumption and carbon emissions of model training and inference, far less attention has been given to the sustainability of the code these models produce. The efficiency of generated code affects the long-term environmental impact of software systems. Inefficient code can increase CPU usage, memory consumption, execution time, and overall energy use during deployment and operation. As LLM-generated code becomes more common in real-world projects, even small inefficiencies can lead to high environmental costs over time. This paper examines existing research on the sustainability of code generated by LLMs. We conduct a systematic literature review to analyze selected primary studies and investigate the extent to which LLMs are capable of producing sustainable code. In addition, we examine how sustainability is defined and measured in this context, including the metrics and evaluation strategies used to assess energy efficiency and resource usage. We also explore whether techniques such as fine-tuning and prompt engineering influence the sustainability of generated code. Through a structured analysis of the selected studies, we categorize research efforts based on their methodological approaches, evaluation practices, and experimental settings. The findings indicate that research in this area remains relatively limited and fragmented, with no widely accepted framework for measuring or benchmarking the sustainability of LLM-generated code. These observations highlight the need for clearer definitions, standardized evaluation methods, and systematic research to support environmentally friendly AI-assisted software engineering.