Search papers, labs, and topics across Lattice.
This paper introduces an interpretable neural network architecture for speaker localization by unfolding the iterative Batch Expectation-Maximization (EM) algorithm. The encoder-EM-decoder structure enhances convergence and reduces sensitivity to initialization, a common issue in traditional EM. Experimental results demonstrate improved accuracy and robustness compared to the classical Batch-EM approach, particularly in reverberant environments.
Unfolding the EM algorithm into a neural network yields a speaker localization method that's more robust and accurate than traditional Batch-EM, especially in challenging acoustic conditions.
We propose an interpretable Batch-EM Unfolded Network for robust speaker localization. By embedding the iterative EM procedure within an encoder-EM-decoder architecture, the method mitigates initialization sensitivity and improves convergence. Experiments show superior accuracy and robustness over the classical Batch-EM in reverberant conditions.