Search papers, labs, and topics across Lattice.
The paper introduces a Carbon-Aware Resource Management (CA-RM) framework for GPU clusters that minimizes carbon emissions by dynamically adjusting GPU core frequency and intelligently placing workloads based on real-time renewable energy availability. They define a performance-per-carbon (PPC) metric and formulate carbon-constrained, performance-constrained, and PPC-driven optimization objectives to balance DNN training deadlines, inference latency, and carbon emission budgets. Simulation results using real-world renewable energy traces and NVIDIA RTX4090 GPU profiling data demonstrate a 35% average carbon reduction compared to other approaches while maintaining service-level agreement (SLA) targets.
Slash your deep learning carbon footprint by 35% without touching hardware or models: a carbon-aware resource manager dynamically juggles GPU frequency and workload placement to align with renewable energy availability.
The explosive growth of artificial intelligence (AI) services has led to massive scaling of GPU computing clusters, causing sharp rises in power consumption and carbon emissions. Although hardware-level accelerator enhancements and deep neural network (DNN) model compression techniques can improve power efficiency, they often encounter deployment barriers and risks of accuracy loss in practice. To address these issues without altering hardware or model architectures, we propose a novel Carbon-Aware Resource Management (CA-RM) framework for GPU clusters. In order to minimize the carbon emission, the CA-RM framework dynamically adjusts energy usage by combining real-time GPU core frequency scaling with intelligent workload placement, aligning computation with the temporal availability of renewable generation. We introduce a new metric, performance-per-carbon (PPC), and develop three optimization formulations: carbon-constrained, performance-constrained, and PPC-driven objectives that simultaneously respect DNN model training deadlines, inference latency requirements, and carbon emission budgets. Through extensive simulations using real-world renewable energy traces and profiling data collected from NVIDIA RTX4090 GPU running representative DNN workloads, we show that the CA-RM framework substantially reduces carbon emission while satisfying service-level agreement (SLA) targets across a wide range of workload characteristics. Through experimental evaluation, we verify that the proposed CA-RM framework achieves approximately 35% carbon reduction on average, compared to competing approaches, while still ensuring acceptable processing performance across diverse workload behaviors.