May 28, 2026arXiv:2605.29613

Decoding Strategies for Diffusion-Based ASR: A Systematic Evaluation of Confidence-Based Thresholding

J. Yeo, Minsu Kim, Hyeongseop Rha, Y. Ro

AI Summary

This paper investigates decoding strategies for diffusion language models (DLMs) in the context of automatic speech recognition (ASR), comparing fixed-number decoding rounds against static and dynamic confidence thresholding. Using negative log-likelihood as a proxy for decoding progress, the authors demonstrate that confidence-based thresholding significantly improves both accuracy and speed compared to fixed-number approaches. The key finding is that a static confidence threshold can match the accuracy of autoregressive ASR while achieving superior efficiency due to the early convergence of most tokens in ASR tasks.

Key Contribution

Ditch fixed-length decoding for diffusion-based ASR: confidence-based thresholds unlock autoregressive accuracy with diffusion-level parallelism.

Abstract

While LLM-based Automatic Speech Recognition (ASR) achieves high accuracy, its speed is limited by sequential autoregressive decoding. Diffusion Language Models (DLMs) offer a parallel alternative, yet their decoding strategies remain under-explored in ASR contexts. This paper analyzes three decoding schemes for DLM-based ASR: fixed-number, static confidence threshold, and dynamic confidence threshold. We propose measuring round-wise accuracy using Negative Log-Likelihood-based uncertainty as a proxy for decoding progress. Our results show that both threshold-based strategies significantly outperform fixed-number schemes in accuracy and speed. We attribute this to a property unique to ASR: most tokens reach high confidence early, allowing reliable ones to be harvested aggressively while leaving only difficult tokens for later rounds. Notably, the static-threshold strategy matches the accuracy of autoregressive decoding while offering superior efficiency.

Inference & Quantization Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References18

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Decoding Strategies for Diffusion-Based ASR: A Systematic Evaluation of Confidence-Based Thresholding

Related Papers