NVIDIAJun 4, 2026arXiv:2606.05951

Demystifying NVSHMEM: A System-Level Analysis on Symmetric Memory and Device-Initiated Operations in GPU Communication

Siyuan Shen, Tiancheng Chen, Akhil Langer, Jiri Kraus, Benjamin Glick, Craig Belusar, Jeff Hammond, Torsten Hoefler

AI Summary

This paper provides a comprehensive analysis of NVSHMEM, NVIDIA's PGAS communication library for GPU clusters, emphasizing its programming model, implementation, and performance in the context of symmetric memory and one-sided operations. By conducting a case study with DeepEP, it reveals how NVSHMEM facilitates fine-grained GPU-driven communication, which is crucial for optimizing performance in sparse deep learning applications. The findings underscore NVSHMEM's significance as a foundational component in GPU communication, identifying design tradeoffs and potential enhancements for runtime efficiency.

Key Contribution

NVSHMEM's innovative device-side symmetric-memory model could redefine GPU communication strategies, pushing the boundaries of hardware performance.

Abstract

NVSHMEM is NVIDIA's OpenSHMEM-based PGAS communication library for GPU clusters, enabling GPU-initiated, one-sided communication through symmetric memory. Despite its growing adoption, a system-level understanding of its design and behavior remains scattered across documentation, source code, and application experience. This paper presents a concise study of NVSHMEM's programming model, implementation, and performance characteristics, focusing on symmetric memory, one-sided operations, and device-side collectives. We also examine DeepEP as a case study of NVSHMEM in performance-critical sparse deep learning workloads. Our analysis shows that NVSHMEM pioneered a device-side symmetric-memory programming model that enables fine-grained GPU-driven communication and is important for approaching the hardware performance limit. Overall, this work defines NVSHMEM's role as a systems building block, highlights its design tradeoffs, and identifies opportunities for improving GPU communication runtimes.

Distributed Systems & Hardware

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Demystifying NVSHMEM: A System-Level Analysis on Symmetric Memory and Device-Initiated Operations in GPU Communication

Related Papers