Search papers, labs, and topics across Lattice.
This paper introduces a training-free intelligibility-guided observation addition (OA) method to improve ASR performance in noisy environments by fusing noisy and speech-enhanced speech. The fusion weights are derived from intelligibility estimates directly obtained from the backend ASR, avoiding the need for training a separate neural predictor. Experiments across various SE-ASR combinations and datasets demonstrate that this approach achieves strong robustness and outperforms existing OA baselines.
Ditch the training data: this intelligibility-guided approach fuses noisy and enhanced speech for robust ASR without needing a separate neural predictor.
Automatic speech recognition (ASR) degrades severely in noisy environments. Although speech enhancement (SE) front-ends effectively suppress background noise, they often introduce artifacts that harm recognition. Observation addition (OA) addressed this issue by fusing noisy and SE enhanced speech, improving recognition without modifying the parameters of the SE or ASR models. This paper proposes an intelligibility-guided OA method, where fusion weights are derived from intelligibility estimates obtained directly from the backend ASR. Unlike prior OA methods based on trained neural predictors, the proposed method is training-free, reducing complexity and enhances generalization. Extensive experiments across diverse SE-ASR combinations and datasets demonstrate strong robustness and improvements over existing OA baselines. Additional analyses of intelligibility-guided switching-based alternatives and frame versus utterance-level OA further validate the proposed design.