H6 and CMar 18, 2026arXiv:2603.18126

A Survey of Neural Network Variational Monte Carlo from a Computing Workload Characterization Perspective

Zhengze Xiao, Xuanzhe Ding, Yuyang Lou, Yuyang Lou, Lixue Cheng

AI Summary

This paper presents a workload characterization of Neural Network Variational Monte Carlo (NNVMC) methods on GPUs, focusing on PauliNet, FermiNet, Psiformer, and Orbformer. It identifies performance bottlenecks in NNVMC stemming from low-intensity elementwise operations and data movement, despite the compute-intensive nature of the underlying physics. The analysis reveals substantial variations in compute/memory balance across different ansätze and computational stages, informing potential algorithm-hardware co-design strategies.

Key Contribution

NNVMC's promise for solving quantum many-body problems is currently bottlenecked by surprisingly mundane issues: low-intensity elementwise operations and data movement on GPUs.

Abstract

Neural Network Variational Monte Carlo (NNVMC) has emerged as a promising paradigm for solving quantum many-body problems by combining variational Monte Carlo with expressive neural-network wave-function ans\"atze. Although NNVMC can achieve competitive accuracy with favorable asymptotic scaling, practical deployment remains limited by high runtime and memory cost on modern graphics processing units (GPUs). Compared with language and vision workloads, NNVMC execution is shaped by physics-specific stages, including Markov-Chain Monte Carlo sampling, wave-function construction, and derivative/Laplacian evaluation, which produce heterogeneous kernel behavior and nontrivial bottlenecks. This paper provides a workload-oriented survey and empirical GPU characterization of four representative ans\"atze: PauliNet, FermiNet, Psiformer, and Orbformer. Using a unified profiling protocol, we analyze model-level runtime and memory trends and kernel-level behavior through family breakdown, arithmetic intensity, roofline positioning, and hardware utilization counters. The results show that end-to-end performance is often constrained by low-intensity elementwise and data-movement kernels, while the compute/memory balance varies substantially across ans\"atze and stages. Based on these findings, we discuss algorithm--hardware co-design implications for scalable NNVMC systems, including phase-aware scheduling, memory-centric optimization, and heterogeneous acceleration.

Distributed Systems & Hardware Scientific Discovery & Drug Design Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References37

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

A Survey of Neural Network Variational Monte Carlo from a Computing Workload Characterization Perspective

Related Papers