Search papers, labs, and topics across Lattice.
CUTh-Solver is a GPU-accelerated Preconditioned Conjugate Gradient (PCG)-based sparse solver specifically designed for high-resolution thermal simulations of 3D integrated circuits (ICs). By optimizing data storage through a condensed Diagonal format and employing diagonal-wise SpMV for coalesced memory access, the framework addresses critical bottlenecks in computational efficiency and hardware utilization. Experimental results demonstrate that CUTh-Solver achieves up to 25.8x speedup compared to COMSOL Multiphysics and over 3x speedup against NVIDIA's general-purpose libraries, validating the effectiveness of its domain-specific optimizations.
CUTh-Solver achieves unprecedented speedups in thermal simulations, outperforming existing GPU solvers by over 25 times.
Coarse-grained thermal simulation tends to underestimate localized thermal issues, potentially missing critical hotspots. Accurate analysis, therefore, demands fine-grained information, which dramatically increases grid resolution and thus computational workload. Fortunately, the coefficient matrices are often sparse with regular sparsity patterns, offering optimization opportunities. However, existing general-purpose matrix solvers on GPUs rarely exploit these domain-specific properties, thereby encountering bottlenecks in data storage, memory access, parallelism, computational efficiency, and hardware utilization. Therefore, we propose CUTh-Solver, a co-designed GPU-accelerated Preconditioned Conjugate Gradient (PCG)-based sparse solver framework for Symmetric Positive Definite (SPD) systems arising from high-resolution steady-state and transient 3D IC thermal simulation. For data storage, CUTh-Solver condenses the Diagonal (DIA) storage format to remove redundancy. To optimize the memory access, CUTh-Solver employs diagonal-wise SpMV to achieve coalesced memory access. We further observe a critical conflict between parallelism and preconditioning quality and thus adopt a high-parallelism preconditioning strategy. To improve computational efficiency and hardware utilization, we employ an adaptive fine-grained mixed-precision strategy that leverages diverse floating-point units to avoid resource contention, enhancing throughput without compromising numerical stability. Experimental results show that CUTh-Solver achieves up to 25.8x speedup over GPU-accelerated COMSOL Multiphysics 6.4 and over 3x speedup over NVIDIA's native general-purpose libraries (AmgX, cuSPARSE, cuDSS). Ablation studies validate the individual contribution of each optimization. The code is available at: https://github.com/Chenghan-Wang/CUTh-Solver