May 6, 2026arXiv:2605.04543

UniVer: A Unified Perspective for Multi-step and Multi-draft Speculative Decoding

AI Summary

This paper introduces UniVer, a novel speculative decoding algorithm that unifies multi-step and multi-draft verification by framing it as a conditional Optimal Transport (OT) problem. UniVer leverages prefix acceptance probabilities as dynamic scaling factors to guide draft selection, enabling joint optimization across tree levels. The authors prove UniVer is lossless and achieves optimal acceptance rates, demonstrating improvements of 4.2% to 8.5% in acceptance length compared to standard methods while preserving distributional alignment.

Key Contribution

UniVer achieves state-of-the-art speculative decoding by jointly optimizing multi-step and multi-draft verification, outperforming existing methods by up to 8.5% in acceptance length.

Abstract

Speculative decoding accelerates Large Language Models via draft-then-verify, where verification can be framed as an Optimal Transport (OT) problem. Existing approaches typically handle multi-draft and multi-step aspects in isolation, applying either flat OT to single-step drafts or per-token rejection sampling to tree-structured candidates. This separation leaves the joint regime (where multi-step dependencies meet multi-draft branching) poorly optimized, as local verification rules fail to exploit the coupling between horizontal and vertical dimensions of candidate trees. In this paper, we propose a unified perspective that casts tree-based verification as a conditional OT problem. Our key insight is that vertical dependencies can be abstracted through prefix acceptance probabilities, which act as dynamic scaling factors to actively guide horizontal draft selection. Based on this principle, we introduce UniVer, a verification algorithm that jointly optimizes across tree levels by composing local optimal transport plans under prefix constraints. We prove that UniVer remains lossless and achieves the optimal acceptance rate under the proposed conditional framework. Extensive experiments across different tasks and models demonstrate that UniVer improves acceptance length by 4.2% to 8.5% over standard recursive rejection sampling without replacement, while maintaining exact distributional alignment with the target model.

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References28

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

UniVer: A Unified Perspective for Multi-step and Multi-draft Speculative Decoding

Related Papers