Search papers, labs, and topics across Lattice.
This paper introduces a gate fusion technique for both forward and backward passes in classical simulation of quantum machine learning, aiming to reduce memory access bottlenecks and improve throughput. The method fuses consecutive gates to minimize global memory accesses, leading to significant speedups. Experiments show a 20-30x throughput improvement and the ability to train a 20-qubit, 1,000-layer model with 60,000 parameters using 1,000 samples in 20 minutes by combining gate fusion with gradient checkpointing.
Train a 20-qubit quantum machine learning model with 60,000 parameters in just 20 minutes using classical simulation, thanks to a novel gate fusion technique.
While real quantum devices have been increasingly used to conduct research focused on achieving quantum advantage or quantum utility in recent years, executing deep quantum circuits or performing quantum machine learning with large-scale data on current noisy intermediate-scale quantum devices remains challenging, making classical simulation essential for quantum machine learning research. However, classical simulation often suffers from the cost of gradient calculations, requiring enormous memory or computational time. In this paper, to address these problems, we propose a method to fuse multiple consecutive gates in each of the forward and backward paths to improve throughput by minimizing global memory accesses. As a result, we achieved approximately $20$ times throughput improvement for a Hardware-Efficient Ansatz with $12$ or more qubits, reaching over $30$ times improvement on a mid-range consumer GPU with limited memory bandwidth. By combining our proposed method with gradient checkpointing, we drastically reduce memory usage, making it possible to train a large-scale quantum machine learning model, a $20$-qubit, $1,000$-layer model with $60,000$ parameters, using $1,000$ samples in approximately $20$ minutes. This implies that we can train the model on large datasets, consisting of tens of thousands of samples, such as MNIST or CIFAR-10, within a realistic time frame (e.g., $20$ hours per epoch). In this way, our proposed method drastically accelerates classical simulation of quantum machine learning, making a significant contribution to quantum machine learning research and variational quantum algorithms, such as verifying algorithms on large datasets or investigating learning theories of deep quantum circuits like barren plateau.