Search papers, labs, and topics across Lattice.
This paper compares the energy efficiency of NVIDIA H100 and H200 GPUs under different power caps, focusing on the impact of their distinct memory technologies (HBM2e vs. HBM3e) on power distribution between memory and SMs. Using compute-bound (DGEMM) and memory-bound (TheBandwidthBenchmark) workloads, the study performs regression analysis to identify memory power limits and outliers. Results show that H100 is more efficient for compute-bound tasks, while H200 excels in memory-bound applications across various power caps.
Turns out, the latest and greatest GPU isn't always the most energy-efficient: NVIDIA's H100 surprisingly beats the H200 for compute-bound workloads under power constraints.
Modern NVIDIA GPUs like the H100 (HBM2e) and H200 (HBM3e) share similar compute characteristics but differ significantly in memory interface technology and bandwidth. By isolating memory bandwidth as a key variable, the power distribution between the memory and Streaming Multiprocessors (SM) changes notably between the two architectures. In the era of energy-efficient computing, analyzing how these hardware characteristics impact performance per watt is critical. This study investigates how the H100 and H200 manage memory power consumption at various power-cap levels. By a regression analysis, we study the memory power limit and uncover outliers consuming more memory power. To evaluate efficiency, we employ compute-bound (DGEMM) and memory-bound (TheBandwidthBenchmark) workloads, representing the two extremes of the Roof\-line model. Our observations indicate that across varying power caps, the H100 remains the slightly better choice for strictly compute-bound workloads, whereas the H200 demonstrates superior efficiency for memory-bound applications.