Search papers, labs, and topics across Lattice.
The paper introduces Spatial-Magnifier, a neural network that generates virtual microphone signals from a limited set of real microphone measurements to improve the spatial directivity of multichannel speech enhancement. This is important because it addresses the physical constraints of fitting large microphone arrays into edge devices. Experiments show that Spatial-Magnifier, combined with the Spatial Audio Representation Learning (SARL) framework, outperforms existing spatial upsampling baselines and nearly recovers oracle performance with all microphones.
Unlock near-oracle speech enhancement performance from compact microphone arrays by virtually expanding their spatial coverage with a novel neural network.
While the spatial directivity of multichannel speech enhancement algorithms improves with the number of microphones, fitting large capture arrays into real-world edge devices is typically limited by physical constraints. To overcome this limitation, we propose Spatial-Magnifier, a neural network designed to generate virtual microphone (VM) signals from a limited set of real microphone (RM) measurements. Moreover, we introduce the Spatial Audio Representation Learning (SARL) framework, which leverages estimated VM signals and features to condition a downstream speech enhancement system. Experimental results demonstrate that the proposed framework outperforms existing spatial upsampling baselines across various speech extraction systems, including end-to-end multichannel speech enhancement and neural beamforming. The proposed method nearly recovers the oracle performance achieved when all microphones are available.