Search papers, labs, and topics across Lattice.
This paper introduces Align-Consistency, a consistency regularization method tailored for Align-Refine, a non-autoregressive (non-AR) ASR model that iteratively refines frame-level hypotheses. By applying consistency regularization to both the base CTC model and refinement steps, the method achieves additive accuracy improvements from non-AR decoding and consistency regularization. The approach also leverages fast non-AR decoding to generate pseudo-labels for semi-supervised learning, leading to substantial performance gains.
Non-autoregressive ASR models can achieve significant accuracy gains by applying consistency regularization during both the initial CTC prediction and subsequent refinement steps.
Consistency regularization (CR) improves the robustness and accuracy of Connectionist Temporal Classification (CTC) by ensuring predictions remain stable across input perturbations. In this work, we propose Align-Consistency, an extension of CR designed for Align-Refine -- a non-autoregressive (non-AR) model that performs iterative refinement of frame-level hypotheses. This method leverages the speed of parallel inference while significantly boosting recognition performance. The effectiveness of Align-Consistency is demonstrated in two settings. First, in the fully supervised setting, our results indicate that applying CR to both the base CTC model and the subsequent refinement steps is critical, and the accuracy improvements from non-AR decoding and CR are mutually additive. Second, for semi-supervised ASR, we employ fast non-AR decoding to generate online pseudo-labels on unlabeled data, which are used to further refine the supervised model and lead to substantial gains.