Search papers, labs, and topics across Lattice.
This paper analyzes the query complexity of inference in quantum kernel methods, focusing on estimating the weighted sum of kernel values. It identifies two axes for improvement: kernel value estimation (sampling vs. quantum amplitude estimation) and summation approximation (term-by-term vs. single observable). The authors achieve a query-optimal complexity of $O(\lVert\alpha\rVert_1/\varepsilon)$ by encoding the inference sum as a single observable and applying quantum amplitude estimation, and prove a matching lower bound.
Quantum machine learning inference can be quadratically sped up by estimating a single observable, rather than summing individual kernel values, achieving query optimality.
Quantum kernel methods are among the leading candidates for achieving quantum advantage in supervised learning. A key bottleneck is the cost of inference: evaluating a trained model on new data requires estimating a weighted sum $\sum_{i=1}^N \alpha_i k(x,x_i)$ of $N$ kernel values to additive precision $\varepsilon$, where $\alpha$ is the vector of trained coefficients. The standard approach estimates each term independently via sampling, yielding a query complexity of $O(N\lVert\alpha\rVert_2^2/\varepsilon^2)$. In this work, we identify two independent axes for improvement: (1) How individual kernel values are estimated (sampling versus quantum amplitude estimation), and (2) how the sum is approximated (term-by-term versus via a single observable), and systematically analyze all combinations thereof. The query-optimal combination, encoding the full inference sum as the expectation value of a single observable and applying quantum amplitude estimation, achieves a query complexity of $O(\lVert\alpha\rVert_1/\varepsilon)$, removing the dependence on $N$ from the query count and yielding a quadratic improvement in both $\lVert\alpha\rVert_1$ and $\varepsilon$. We prove a matching lower bound of $\Omega(\lVert\alpha\rVert_1/\varepsilon)$, establishing query-optimality of our approach up to logarithmic factors. Beyond query complexity, we also analyze how these improvements translate into gate costs and show that the query-optimal strategy is not always optimal in practice from the perspective of gate complexity. Our results provide both a query-optimal algorithm and a practically optimal choice of strategy depending on hardware capabilities, along with a complete landscape of intermediate methods to guide practitioners. All algorithms require only amplitude estimation as a subroutine and are thus natural candidates for early-fault-tolerant implementations.