Mar 11, 2026arXiv:2603.11243

Self-Speculative Decoding for LLM-based ASR with CTC Encoder Drafts

G. Saon, Samuel Thomas, Takashi Fukuda, Tohru Nagano, Avihu Dekel, Luis A. Lastras

AI Summary

The paper introduces self-speculative decoding, a novel approach for accelerating LLM-based ASR by leveraging a CTC encoder as a draft model. This method selectively accepts or verifies CTC hypotheses based on entropy and token likelihoods, reducing the computational cost of auto-regressive decoding. Experiments across multiple languages and datasets demonstrate a significant speedup (4.4x inverse real time factor) and a new state-of-the-art 5.58% WER on the HuggingFace Open ASR benchmark, with only a small WER increase compared to full AR decoding.

Key Contribution

LLM-based ASR can be sped up by 4.4x with minimal accuracy loss by using a CTC encoder to speculatively generate draft transcriptions.

Abstract

We propose self-speculative decoding for speech-aware LLMs by using the CTC encoder as a draft model to accelerate auto-regressive (AR) inference and improve ASR accuracy. Our three-step procedure works as follows: (1) if the frame entropies of the CTC output distributions are below a threshold, the greedy CTC hypothesis is accepted as final; (2) otherwise, the CTC hypothesis is verified in a single LLM forward pass using a relaxed acceptance criterion based on token likelihoods; (3) if verification fails, AR decoding resumes from the accepted CTC prefix. Experiments on nine corpora and five languages show that this approach can simultaneously accelerate decoding and reduce WER. On the HuggingFace Open ASR benchmark with a 1B parameter LLM and 440M parameter CTC encoder, we achieve a record 5.58% WER and improve the inverse real time factor by a factor of 4.4 with only a 12% relative WER increase over AR search. Code and model weights are publicly available under a permissive license.

Inference & Quantization Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References41

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Self-Speculative Decoding for LLM-based ASR with CTC Encoder Drafts

Related Papers