Friedrich-Alexander University of Erlangen-NurembergApr 13, 2026arXiv:2604.11391

Architectural Trade-offs in the Energy-Efficient Era: A Comparative Study of power-capping NVIDIA H100 and H200

A. Ujeniya, Aditya Ujeniya, Jan Eitzinger, Georg Hager, G. Wellein, Gerhard Wellein

AI Summary

This paper compares the energy efficiency of NVIDIA H100 and H200 GPUs under different power caps, focusing on the impact of their distinct memory technologies (HBM2e vs. HBM3e) on power distribution between memory and SMs. Using compute-bound (DGEMM) and memory-bound (TheBandwidthBenchmark) workloads, the study performs regression analysis to identify memory power limits and outliers. Results show that H100 is more efficient for compute-bound tasks, while H200 excels in memory-bound applications across various power caps.

Key Contribution

Turns out, the latest and greatest GPU isn't always the most energy-efficient: NVIDIA's H100 surprisingly beats the H200 for compute-bound workloads under power constraints.

Abstract

Modern NVIDIA GPUs like the H100 (HBM2e) and H200 (HBM3e) share similar compute characteristics but differ significantly in memory interface technology and bandwidth. By isolating memory bandwidth as a key variable, the power distribution between the memory and Streaming Multiprocessors (SM) changes notably between the two architectures. In the era of energy-efficient computing, analyzing how these hardware characteristics impact performance per watt is critical. This study investigates how the H100 and H200 manage memory power consumption at various power-cap levels. By a regression analysis, we study the memory power limit and uncover outliers consuming more memory power. To evaluate efficiency, we employ compute-bound (DGEMM) and memory-bound (TheBandwidthBenchmark) workloads, representing the two extremes of the Roof\-line model. Our observations indicate that across varying power caps, the H100 remains the slightly better choice for strictly compute-bound workloads, whereas the H200 demonstrates superior efficiency for memory-bound applications.

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References8

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Architectural Trade-offs in the Energy-Efficient Era: A Comparative Study of power-capping NVIDIA H100 and H200

Related Papers