Search papers, labs, and topics across Lattice.
The paper proposes a hardware-friendly scheme for generating the uniformly random polynomial 'a' in RLWE-based FHE accelerators, aiming to reduce communication overhead and improve hardware efficiency. The scheme allows parallel generation of uniformly distributed samples with relaxed wiring requirements and unrestricted random-access to RNS limbs. The approach achieves a low overhead on the client side (less than 3%) during key generation and reduces power consumption in high-throughput configurations.
Cut the Watts: This hardware trick slashes power consumption in FHE accelerators by generating randomness on-the-fly, ditching bulky wiring and boosting throughput.
The Ring-Learning With Errors (RLWE) problem forms the backbone of highly efficient Fully Homomorphic Encryption (FHE) schemes. A significant component of the RLWE public key and ciphertext of the form $(b,a)$ is the uniformly random polynomial $a \in R_q$ . While essential for security, the communication overhead of transmitting $a$ from client to server, and inputting it into a hardware accelerator, can be substantial, especially for FHE accelerators aiming at high acceleration factors. A known technique in reducing this overhead generates $a$ from a small seed on the client side via a deterministic process, transmits only the seed, and generates $a$ on-the-fly within the accelerator. Challenges in the hardware implementation of an accelerator include wiring (density and power), compute area, compute power as well as flexibility in scheduling of on-the-fly generation instructions. This extended abstract proposes a concrete scheme and parameters wherein these practical challenges are addressed. We detail the benefits of our approach, which maintains the reduction in communication latency and memory footprint, while allowing parallel generation of uniformly distributed samples, relaxed wiring requirements, unrestricted randomaccess to RNS limbs, and results in an extremely low overhead on the client side (i.e. less than 3%) during the key generation process. The proposed scheme eliminates the need for thick metal layers for randomness distribution and prevents the power consumption of the PRNG subsystem from scaling prohibitively with the acceleration factor, potentially saving tens of Watts per accelerator chip in high-throughput configurations.