Search papers, labs, and topics across Lattice.
This paper evaluates the intelligibility of both classical and neural speech codecs under noisy conditions, with and without prior speech enhancement (SE). Results indicate that classical codecs exhibit greater noise robustness compared to neural codecs. The application of SE significantly improves intelligibility and reduces listening effort for codecs struggling with noise, highlighting the importance of preprocessing for neural codec performance.
Classical speech codecs still outperform neural codecs in noisy environments, but speech enhancement can close the gap.
Preserving speech intelligibility is a minimum requirement for speech codecs in communication. Recently, very low-bitrate neural codecs have gained interest for replacing classical codecs, reinforcing the need to evaluate whether intelligibility is preserved in realistic scenarios. In this paper, we evaluate the intelligibility and listening effort of classical and neural speech codecs in clean and noisy conditions. Further, we assess the impact of speech enhancement (SE) before coding, simulating a possible audio processing pipeline. The results show that classical codecs are more noise robust than neural codecs. Further, SE can lead to significant intelligibility and listening effort improvements for codecs otherwise negatively affected by noise. Listening effort reveals nuanced differences when intelligibility is saturated. Lastly, objective intelligibility based on automatic speech recognition is highly correlated with subjective intelligibility scores averaged per condition.