Search papers, labs, and topics across Lattice.
This study introduces a novel method for improving the robustness of narrowband (NB) direction-of-arrival (DoA) estimation in distant microphone data by employing single frequency filtering (SFF) to enhance cross-correlation of speech-present time-frequency regions. The key finding reveals that the SFF-based NB estimator consistently outperforms existing state-of-the-art NB methods and even surpasses some broadband (BB) estimators across various reverberation and noise conditions. This advancement addresses the limitations of traditional NB methods, particularly their susceptibility to spatial aliasing, while leveraging frequency sparsity for multiple speaker detection in real-time scenarios.
The SFF-based narrowband DoA estimator not only mitigates spatial aliasing but also outperforms leading broadband methods in challenging acoustic environments.
In distant microphones, broadband (BB) methods for direction-of-arrival (DoA) estimation are more suitable than narrowband (NB) methods. Due to the aggregation of their optimization function across all frequency bands, BB estimators are robust to spatial aliasing, a known problem in processing distant microphone data. In NB methods, DoA estimation is performed by utilizing \textit{local} information in each frequency band and hence the estimation is affected by spatial aliasing. However, unlike BB methods, NB methods exploit frequency sparsity to estimate the DoAs of \textit{multiple speakers} in a \textit{single time frame}. In this article, a method to improve the robustness of a NB DoA estimator to spatial aliasing is developed. The proposed method is based on cross-correlation of speech-present time-frequency regions obtained by single frequency filtering (SFF) of the microphone signals. The SFF spectrum is chosen because SFF components have regions of high signal-to-noise ratio both in time and frequency and because speech and non-speech discrimination is robust to degradations in the SFF domain. The proposed NB estimator is compared to four state-of-the-art estimators (one NB and three BB) using detection and accuracy metrics on simulated and real-world data in different reverberation and noise conditions. The results show that in all the environments, the SFF-based NB approach outperforms the state-of-the-art NB approach. Furthermore, the performance of the SFF-based approach is better than some of the BB estimators.