Search papers, labs, and topics across Lattice.
1
0
3
4
Forget slow FP64: this work unlocks efficient double-precision matrix multiplication on modern GPUs by adapting the Ozaki-II scheme to run on faster FP8 hardware.