Search papers, labs, and topics across Lattice.
This paper investigates INT8 emulation of FP64 matrix multiplications within the LSMS CPU application using the SCILIB-Accel tool, achieving acceleration on GPUs without code modification. The study reveals that the accuracy of INT8 emulation is contingent on both arithmetic precision and operator characteristics, necessitating tunable precision emulation strategies. Results demonstrate the potential for simultaneous accuracy and performance improvements, suggesting a viable path for AI-driven hardware optimization in HPC.
Achieve HPC acceleration by emulating FP64 operations with INT8 precision on GPUs, proving that you can boost performance *and* accuracy.
This study explores the use of INT8-based emulation for accelerating traditional FP64-based HPC workloads on modern GPU architectures. Through SCILIB-Accel automatic BLAS offload tool for cache-coherent Unified Memory Architecture, we emulate FP64 matrix multiplications in the LSMS CPU application in the MuST suite without code changes. We find that accuracy depends on both arithmetic precision and the properties of the operator, which can be dealt with through tunable precision emulation. Unlike traditional mixed-precision approaches, this method preserves original algorithms while optimizing hardware utilization. We showcase the potential of improving accuracy and performance at the same time. This work highlights the potential of AI-driven hardware to transform HPC, advocating for adaptive precision strategies in future scientific computing.