Search papers, labs, and topics across Lattice.
This paper analyzes the performance of GPU-accelerated CKKS homomorphic encryption under different parameter configurations, highlighting the importance of tailoring optimization strategies to specific workloads. They classify existing optimizations based on dataflow characteristics affecting memory footprint and conduct a thorough performance analysis across various CKKS parameters and GPU architectures. Results show that the optimal optimization strategy varies significantly with CKKS parameters and GPU architecture, with performance differences of up to 1.98x between strategies.
Blindly applying GPU optimizations to homomorphic encryption can leave nearly 2x performance on the table, as the best strategy hinges on CKKS parameters and GPU architecture.
Fully Homomorphic Encryption (FHE) enables secure computation over encrypted data, but its computational cost remains a major obstacle to practical deployment. To mitigate this overhead, many studies have explored GPU acceleration for the CKKS scheme, which is widely used for approximate arithmetic. In CKKS, CKKS parameters are configured for each workload by balancing multiplicative depth, security requirements, and performance. These parameters significantly affect ciphertext size, thereby determining how the memory footprint fits within the GPU memory hierarchy. Nevertheless, prior studies typically apply their proposed optimization methods uniformly, without considering differences in CKKS parameter configurations. In this work, we demonstrate that the optimal GPU optimization strategy for CKKS depends on the CKKS parameter configuration. We first classify prior optimizations by two aspects of dataflows which affect memory footprint and then conduct both qualitative and quantitative performance analyses. Our analysis shows that even on the same GPU architecture, the optimal strategy varies with CKKS parameters with performance differences of up to 1.98 $\times$ between strategies, and that the criteria for selecting an appropriate strategy differ across GPU architectures.