Search papers, labs, and topics across Lattice.
This paper introduces GEMM-GS, a novel approach to accelerate 3D Gaussian Splatting (3DGS) by reformulating its blending process into a GEMM-compatible form suitable for Tensor Cores on modern GPUs. By transforming the blending operation, the method leverages the high throughput of Tensor Cores, which are often underutilized in existing 3DGS pipelines. The implementation includes a high-performance CUDA kernel with a three-stage double-buffered pipeline, achieving a 1.42x speedup over vanilla 3DGS and a 1.47x speedup when combined with other acceleration techniques.
Unlock the full potential of your GPU's Tensor Cores for 3D Gaussian Splatting with a GEMM-friendly blending transformation that delivers up to 2x speedups.
Neural Radiance Fields (NeRF) enables 3D scene reconstruction from several 2D images but incurs high rendering latency via its point-sampling design. 3D Gaussian Splatting (3DGS) improves on NeRF with explicit scene representation and an optimized pipeline yet still fails to meet practical real-time demands. Existing acceleration works overlook the evolving Tensor Cores of modern GPUs because 3DGS pipeline lacks General Matrix Multiplication (GEMM) operations. This paper proposes GEMM-GS, an acceleration approach utilizing tensor cores on GPUs via GEMM-friendly blending transformation. It equivalently reformulates the 3DGS blending process into a GEMM-compatible form to utilize Tensor Cores. A high-performance CUDA kernel is designed, integrating a three-stage double-buffered pipeline that overlaps computation and memory access. Extensive experiments show that GEMM-GS achieves $1.42\times$ speedup over vanilla 3DGS and provides an additional $1.47\times$ speedup on average when combining with existing acceleration approaches. Code is released at https://github.com/shieldforever/GEMM-GS.