Mar 16, 2026arXiv:2603.15288

Neural Network-Based Time-Frequency-Bin-Wise Linear Combination of Beamformers for Underdetermined Target Source Extraction

Changda Chen, Yichen Yang, Wei Liu, Shoji Makino

AI Summary

This paper introduces a neural network-based time-frequency-bin-wise linear combination (NN-TFLC) framework for target source extraction in underdetermined mixtures, addressing the limitations of independent TF bin processing in previous methods. The NN-TFLC framework constructs MPDR beamformers without explicit noise covariance estimation and uses a cross-attention mechanism to predict temporally and spectrally coherent linear combination weights. Experiments on dual-microphone mixtures demonstrate that NN-TFLC-MPDR outperforms TFS/TFLC-MPDR and achieves competitive performance compared to TFS/TFLC-MVDR, which relies on noise priors.

Key Contribution

Ditch the noise priors: a new neural beamforming approach uses cross-attention to extract target sources from complex audio mixtures, outperforming prior art that requires explicit noise estimation.

Abstract

Extracting a target source from underdetermined mixtures is challenging for beamforming approaches. Recently proposed time-frequency-bin-wise switching (TFS) and linear combination (TFLC) strategies mitigate this by combining multiple beamformers in each time-frequency (TF) bin and choosing combination weights that minimize the output power. However, making this decision independently for each TF bin can weaken temporal-spectral coherence, causing discontinuities and consequently degrading extraction performance. In this paper, we propose a novel neural network-based time-frequency-bin-wise linear combination (NN-TFLC) framework that constructs minimum power distortionless response (MPDR) beamformers without explicit noise covariance estimation. The network encodes the mixture and beamformer outputs, and predicts temporally and spectrally coherent linear combination weights via a cross-attention mechanism. On dual-microphone mixtures with multiple interferers, NN-TFLC-MPDR consistently outperforms TFS/TFLC-MPDR and achieves competitive performance with TFS/TFLC built on the minimum variance distortionless response (MVDR) beamformers that require noise priors.

Architecture Design (Transformers, SSMs, MoE)Speech & Audio Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Neural Network-Based Time-Frequency-Bin-Wise Linear Combination of Beamformers for Underdetermined Target Source Extraction

Related Papers