TechnionApr 14, 2026arXiv:2604.12989

Accelerating Speculative Decoding with Block Diffusion Draft Trees

L. Ringel, Liran Ringel, Yaniv Romano, Yaniv Romano

AI Summary

This paper introduces Diffusion Draft Tree (DDTree), a novel speculative decoding method that leverages a block diffusion drafter to construct a tree of draft continuations. DDTree uses a best-first heap algorithm to select the most promising continuations based on the draft model's output, allowing for efficient verification of multiple trajectories in a single forward pass of the target model. Experiments demonstrate that DDTree, building upon the DFlash drafter, achieves state-of-the-art speculative decoding performance.

Key Contribution

Unleashing the full potential of block diffusion drafting, DDTree's tree-based verification strategy dramatically accelerates speculative decoding by exploring multiple likely continuations in parallel.

Abstract

Speculative decoding accelerates autoregressive language models by using a lightweight drafter to propose multiple future tokens, which the target model then verifies in parallel. DFlash shows that a block diffusion drafter can generate an entire draft block in a single forward pass and achieve state-of-the-art speculative decoding performance, outperforming strong autoregressive drafters such as EAGLE-3. Vanilla DFlash, however, still verifies only a single drafted trajectory per round, potentially limiting its acceptance length. We introduce DDTree (Diffusion Draft Tree), a method that constructs a draft tree directly from the per-position distributions of a block diffusion drafter. Under a fixed node budget, DDTree uses a simple best-first heap algorithm to select the continuations that are most likely to match the target model according to a surrogate defined by the draft model's output. The resulting tree is verified efficiently in a single target model forward pass using an ancestor-only attention mask. Because DDTree builds on DFlash, a leading draft model for speculative decoding, these gains place DDTree among the leading approaches to speculative decoding.

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References27

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Accelerating Speculative Decoding with Block Diffusion Draft Trees

Related Papers