NVIDIAApr 21, 2026arXiv:2604.19079

Reducing the Offline-Streaming Gap for Unified ASR Transducer with Consistency Regularization

A.S. Andrusenko, Vladimir Bataev, Lilit Grigoryan, Nune Tadevosyan, Vitaly Lavrukhin, Boris Ginsburg

AI Summary

This paper tackles the challenge of unifying offline and streaming ASR within a single RNN-Transducer model by employing chunk-limited attention with right context and dynamic chunked convolutions. They introduce mode-consistency regularization for RNNT (MCR-RNNT), implemented efficiently in Triton, to encourage agreement between offline and streaming training modes. Results demonstrate improved streaming accuracy at low latency without sacrificing offline performance, even when scaling to larger models and datasets.

Key Contribution

Bridging the offline-streaming gap in ASR is now more achievable: a single RNN-Transducer model can deliver high accuracy in both settings, thanks to a novel consistency regularization technique.

Abstract

Unification of automatic speech recognition (ASR) systems reduces development and maintenance costs, but training a single model to perform well in both offline and low-latency streaming settings remains challenging. We present a Unified ASR framework for Transducer (RNNT) training that supports both offline and streaming decoding within a single model, using chunk-limited attention with right context and dynamic chunked convolutions. To further close the gap between offline and streaming performance, we introduce an efficient Triton implementation of mode-consistency regularization for RNNT (MCR-RNNT), which encourages agreement across training modes. Experiments show that the proposed approach improves streaming accuracy at low latency while preserving offline performance and scaling to larger model sizes and training datasets. The proposed Unified ASR framework and the English model checkpoint are open-sourced.

Architecture Design (Transformers, SSMs, MoE)Speech & Audio Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References28

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Reducing the Offline-Streaming Gap for Unified ASR Transducer with Consistency Regularization

Related Papers