Search papers, labs, and topics across Lattice.
This paper presents WHET, a framework that integrates memory-centric, architecture-aware optimizations to enhance the performance of fully homomorphic encryption (FHE) on accelerator architectures. By addressing the inefficiencies of conventional FHE constructions, the authors introduce techniques such as fine-grained coefficient-to-slot transformation and plaintext compression, which significantly reduce on-chip data footprint and memory traffic. The result is a remarkable performance improvement of 1.38-8.74脳 per-area over existing FHE accelerators, along with achieving the first sub-millisecond CKKS bootstrapping.
WHET achieves up to 8.74脳 performance gains for fully homomorphic encryption by aligning cryptographic techniques with hardware capabilities.
Fully homomorphic encryption (FHE) enables computations on encrypted data without decryption, offering strong data privacy at the expense of substantial computational and memory overheads. Prior efforts have steadily improved FHE performance through cryptographic and algorithmic enhancements or hardware acceleration, yet these two directions have progressed largely in isolation, hindering the full exploitation of available hardware capabilities. This work presents WHET, which introduces memory-centric, architecture-aware optimizations to better align cryptographic and algorithmic constructions with FHE accelerator architectures. We identify conventional FHE constructions as major sources of excessive working sets and heavy off-chip memory traffic. We propose accelerator-specific techniques, including fine-grained coefficient-to-slot transformation, plaintext compression, and intermediate modulus raising, to reduce the on-chip data footprint by minimizing temporary ciphertexts and plaintext loads. With these techniques applied, we observe additional opportunities to improve on-chip memory efficiency; hence, we introduce lightweight architectural refinements, including a special-purpose buffer and functional unit extensions. With these optimizations, WHET achieves 1.38-8.74$\times$ per-area performance improvements over state-of-the-art FHE accelerators and the first-ever sub-millisecond CKKS bootstrapping.