Search papers, labs, and topics across Lattice.
This paper introduces DigitsOnTurbo (DoT), a novel approach to large-number arithmetic that restructures computations to maximize SIMD parallelism on modern CPUs. Unlike traditional algorithms, DoT focuses on independent, data-parallel operations, overcoming inherent dependencies that limit SIMD adoption. The result is significant speedups, achieving up to 4x improvement for addition/subtraction and 2x for multiplication when integrated into existing libraries, leading to tangible gains in scientific and cryptographic applications.
SIMD parallelism can finally unlock substantial speedups in large-number arithmetic by rethinking algorithms around data-parallel operations, yielding up to 19.3% throughput gains in scientific computing.
Large-number arithmetic, widely used in scientific computing and cryptography, has seen limited adoption of single instruction, multiple data (SIMD) parallelism on modern CPUs due to the inherent dependencies in traditional algorithms. We present DigitsOnTurbo (DoT), which restructures the computation around independent, data-parallel operations, rather than vectorizing the standard algorithms, thereby leveraging the benefits provided by SIMD. Over prior SIMD implementations, DoT achieves up to 1.85x speedups for addition and subtraction, and 2.3x for multiplication. When integrated into state-of-the-art libraries, DoT yields up to 4x speedup for addition and subtraction, and up to 2x speedup for multiplication, cascading into end-to-end throughput gains of up to 19.3% for scientific computations, and up to 7.9% latency and 5.9% throughput improvements on cryptographic implementations.