Search papers, labs, and topics across Lattice.
This paper introduces a neural network-based time-frequency-bin-wise linear combination (NN-TFLC) framework for target source extraction in underdetermined mixtures, addressing the limitations of independent TF bin processing in previous methods. The NN-TFLC framework constructs MPDR beamformers without explicit noise covariance estimation and uses a cross-attention mechanism to predict temporally and spectrally coherent linear combination weights. Experiments on dual-microphone mixtures demonstrate that NN-TFLC-MPDR outperforms TFS/TFLC-MPDR and achieves competitive performance compared to TFS/TFLC-MVDR, which relies on noise priors.
Ditch the noise priors: a new neural beamforming approach uses cross-attention to extract target sources from complex audio mixtures, outperforming prior art that requires explicit noise estimation.
Extracting a target source from underdetermined mixtures is challenging for beamforming approaches. Recently proposed time-frequency-bin-wise switching (TFS) and linear combination (TFLC) strategies mitigate this by combining multiple beamformers in each time-frequency (TF) bin and choosing combination weights that minimize the output power. However, making this decision independently for each TF bin can weaken temporal-spectral coherence, causing discontinuities and consequently degrading extraction performance. In this paper, we propose a novel neural network-based time-frequency-bin-wise linear combination (NN-TFLC) framework that constructs minimum power distortionless response (MPDR) beamformers without explicit noise covariance estimation. The network encodes the mixture and beamformer outputs, and predicts temporally and spectrally coherent linear combination weights via a cross-attention mechanism. On dual-microphone mixtures with multiple interferers, NN-TFLC-MPDR consistently outperforms TFS/TFLC-MPDR and achieves competitive performance with TFS/TFLC built on the minimum variance distortionless response (MVDR) beamformers that require noise priors.