Search papers, labs, and topics across Lattice.
This paper introduces a multi-GPU implementation of MBE(3)-OSV-MP2, a local correlation method, to accelerate large-scale ab initio calculations. The implementation addresses GPU parallelization challenges in orbital localization, wave function solution, and CUDA kernel adaptation. The resulting GPU-based MBE(3)-OSV-MP2 achieves O(N^{1.9}) scaling and demonstrates significant speedups (up to 40x compared to canonical RI-MP2) in energy calculations for systems like (H2O)n clusters and a 784-atom insulin peptide.
Achieve near-linear scaling and 40x speedup for MP2 calculations on large molecules by unleashing multi-GPU parallelism for local correlation methods.
The computational acceleration of orbital-invariant local correlation methods on graphics processing units (GPUs) has remained largely unexplored due to substantial algorithmic complexities. The runtime efficiency of GPU-implemented local correlation theories can be significantly constrained by the parallelizable degree of the orbital localization procedure, the iterative solution of the local wave function, and the adaptation of CUDA kernels to inherently local or sparse operations. Using the second-order M{\o}ller-Plesset perturbation (MP2) theory, we present a multi-GPU implementation for large-scale third-order many-body expansion orbital-specific virtual MP2 (MBE(3)-OSV-MP2) energy calculations. Accordingly, our algorithms and implementation address the GPU parallelization ability for peak utilization and parallelism of local MP2 computation in several aspects, including Jacobi-Pipek-Mezey localization, randomized OSV generation, direct MP2 integral regeneration, as well as CUDA kernel adaptation to local operations. The GPU-based MBE(3)-OSV-MP2 energy computation achieves $O(N^{1.9})$ scaling and 84\% parallel efficiency up to 24 GPUs distributed on multiple nodes. The present implementation delivers 40-fold wall-time speedup of the canonical RI-MP2 and 10-fold speedup of the CPU-based MBE(3)-OSV-MP2 for (H$_2$O)$_{128}$/cc-pVDZ and (H$_2$O)$_{190}$/cc-pVDZ, respectively. A large scale computation of 784-atom insulin peptide yields the full MBE(3)-OSV-MP2 energies in 24 minutes with cc-pVDZ (7571 basis functions) and 6.4 hours with cc-pVTZ (17448 basis functions) on 8 NVIDIA A800 GPUs. Our work opens up new possibilities for performing fast GPU-based local correlation calculations on real-life macromolecules.