CMU MLApr 6, 2026arXiv:2604.04745

The Energy Cost of Execution-Idle in GPU Clusters

Yiran Lei, Jared Fernandez, Vasilis Kypriotis, D. Skarlatos, Emma Strubell, Justine Sherry, Daniel Vosler

AI Summary

This paper characterizes "execution-idle" on GPUs, a state of high power consumption despite low activity, using telemetry data from a large academic AI cluster. They find that execution-idle accounts for a significant portion (19.7% of in-execution time and 10.7% of energy) of GPU usage across diverse workloads and GPU generations. The authors prototype automatic downscaling and load balancing techniques to mitigate execution-idle, highlighting the need for energy-efficient GPU systems to explicitly manage this state.

Key Contribution

GPUs waste almost 20% of their "in-execution" time in a high-power, low-activity state, revealing a major opportunity for energy savings in AI clusters.

Abstract

GPUs are becoming a major contributor to data center power, yet unlike CPUs, they can remain at high power even when visible activity is near zero. We call this state execution-idle. Using per-second telemetry from a large academic AI cluster, we characterize execution-idle as a recurring low-activity yet high-power state in real deployments. Across diverse workloads and multiple GPU generations, it accounts for 19.7% of in-execution time and 10.7% of energy. This suggests a need to both reduce the cost of execution-idle and reduce exposure to it. We therefore build two prototypes: one uses automatic downscaling during execution-idle, and the other uses load imbalance to reduce exposure, both with performance trade-offs. These findings suggest that future energy-efficient GPU systems should treat execution-idle as a first-class operating state.

Distributed Systems & Hardware Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References52

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

The Energy Cost of Execution-Idle in GPU Clusters

Related Papers