Search papers, labs, and topics across Lattice.
This paper introduces MORPH, a framework that reformulates zero-knowledge proof (ZKP) kernels to align with the architecture of AI ASICs like TPUs, thereby accelerating ZKP proving. MORPH introduces "Big-T complexity," a hardware-aware model, to guide optimizations at both the arithmetic level (MXU-centric extended-RNS lazy reduction) and dataflow level (unified-sharding layout-stationary TPU Pippenger MSM and optimized NTT). Experiments on TPUv6e8 demonstrate up to 10x throughput improvement on NTT and comparable throughput on MSM compared to GZKP.
ZKP proving, previously bottlenecked by MSM and NTT operations, can now achieve up to 10x higher throughput on TPUs thanks to a novel framework that reformulates ZKP kernels for AI-ASIC execution.
Zero-knowledge proof (ZKP) provers remain costly because multi-scalar multiplication (MSM) and number-theoretic transforms (NTTs) dominate runtime as they need significant computation. AI ASICs such as TPUs provide massive matrix throughput and SotA energy efficiency. We present MORPH, the first framework that reformulates ZKP kernels to match AI-ASIC execution. We introduce Big-T complexity, a hardware-aware complexity model that exposes heterogeneous bottlenecks and layout-transformation costs ignored by Big-O. Guided by this analysis, (1) at arithmetic level, MORPH develops an MXU-centric extended-RNS lazy reduction that converts high-precision modular arithmetic into dense low-precision GEMMs, eliminating all carry chains, and (2) at dataflow level, MORPH constructs a unified-sharding layout-stationary TPU Pippenger MSM and optimized 3/5-step NTT that avoid on-TPU shuffles to minimize costly memory reorganization. Implemented in JAX, MORPH enables TPUv6e8 to achieve up-to 10x higher throughput on NTT and comparable throughput on MSM than GZKP. Our code: https://github.com/EfficientPPML/MORPH.