Mar 9, 2026arXiv:2603.08397

NLE: Non-autoregressive LLM-based ASR by Transcript Editing

Avihu Dekel, Samuel Thomas, Takashi Fukada, G. Saon, George Saon

AI Summary

This paper introduces NLE, a non-autoregressive (NAR) approach to LLM-based ASR that reframes speech recognition as conditional transcript editing for fully parallel prediction. NLE leverages a pre-trained speech encoder to extract acoustic embeddings and an initial hypothesis, which are then refined by a bidirectional LLM editor trained with a latent alignment objective. The proposed NLE++ model achieves a 5.67% average WER on the Open ASR leaderboard with an RTFx of 1630, demonstrating a 27x speedup over autoregressive baselines in single-utterance scenarios.

Key Contribution

Ditch slow, sequential decoding: NLE achieves 27x speedup over autoregressive ASR by using a non-autoregressive, LLM-based transcript editing approach.

Abstract

While autoregressive (AR) LLM-based ASR systems achieve strong accuracy, their sequential decoding limits parallelism and incurs high latency. We propose NLE, a non-autoregressive (NAR) approach that formulates speech recognition as conditional transcript editing, enabling fully parallel prediction. NLE extracts acoustic embeddings and an initial hypothesis from a pretrained speech encoder, then refines the hypothesis using a bidirectional LLM editor trained with a latent alignment objective. An interleaved padding strategy exploits the identity mapping bias of Transformers, allowing the model to focus on corrections rather than full reconstruction. On the Open ASR leaderboard, NLE++ achieves 5.67% average WER with an RTFx (inverse real-time factor) of 1630. In single-utterance scenarios, NLE achieves 27x speedup over the AR baseline, making it suitable for real-time applications.

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References51

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

NLE: Non-autoregressive LLM-based ASR by Transcript Editing

Related Papers