Search papers, labs, and topics across Lattice.
This paper presents a workload characterization of Neural Network Variational Monte Carlo (NNVMC) methods on GPUs, focusing on PauliNet, FermiNet, Psiformer, and Orbformer. It identifies performance bottlenecks in NNVMC stemming from low-intensity elementwise operations and data movement, despite the compute-intensive nature of the underlying physics. The analysis reveals substantial variations in compute/memory balance across different ans盲tze and computational stages, informing potential algorithm-hardware co-design strategies.
NNVMC's promise for solving quantum many-body problems is currently bottlenecked by surprisingly mundane issues: low-intensity elementwise operations and data movement on GPUs.
Neural Network Variational Monte Carlo (NNVMC) has emerged as a promising paradigm for solving quantum many-body problems by combining variational Monte Carlo with expressive neural-network wave-function ans\"atze. Although NNVMC can achieve competitive accuracy with favorable asymptotic scaling, practical deployment remains limited by high runtime and memory cost on modern graphics processing units (GPUs). Compared with language and vision workloads, NNVMC execution is shaped by physics-specific stages, including Markov-Chain Monte Carlo sampling, wave-function construction, and derivative/Laplacian evaluation, which produce heterogeneous kernel behavior and nontrivial bottlenecks. This paper provides a workload-oriented survey and empirical GPU characterization of four representative ans\"atze: PauliNet, FermiNet, Psiformer, and Orbformer. Using a unified profiling protocol, we analyze model-level runtime and memory trends and kernel-level behavior through family breakdown, arithmetic intensity, roofline positioning, and hardware utilization counters. The results show that end-to-end performance is often constrained by low-intensity elementwise and data-movement kernels, while the compute/memory balance varies substantially across ans\"atze and stages. Based on these findings, we discuss algorithm--hardware co-design implications for scalable NNVMC systems, including phase-aware scheduling, memory-centric optimization, and heterogeneous acceleration.