Mar 17, 2026arXiv:2603.16278

Speakers Localization Using Batch EM In Unfolding Neural Network

AI Summary

This paper introduces an interpretable neural network architecture for speaker localization by unfolding the iterative Batch Expectation-Maximization (EM) algorithm. The encoder-EM-decoder structure enhances convergence and reduces sensitivity to initialization, a common issue in traditional EM. Experimental results demonstrate improved accuracy and robustness compared to the classical Batch-EM approach, particularly in reverberant environments.

Key Contribution

Unfolding the EM algorithm into a neural network yields a speaker localization method that's more robust and accurate than traditional Batch-EM, especially in challenging acoustic conditions.

Abstract

We propose an interpretable Batch-EM Unfolded Network for robust speaker localization. By embedding the iterative EM procedure within an encoder-EM-decoder architecture, the method mitigates initialization sensitivity and improves convergence. Experiments show superior accuracy and robustness over the classical Batch-EM in reverberant conditions.

Architecture Design (Transformers, SSMs, MoE)Speech & Audio Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References9

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Speakers Localization Using Batch EM In Unfolding Neural Network

Related Papers