HuaweiNJUNorthwesternApr 8, 2026arXiv:2604.06810

EvoTSE: Evolving Enrollment for Target Speaker Extraction

Ziqian Wang, Xingchen Li, Yike Zhu, Longshuai Xiao, Longshuai Xiao

AI Summary

The paper introduces EvoTSE, an evolving target speaker extraction framework that iteratively refines the enrollment signal using reliability-filtered retrieval of historical high-confidence speaker estimates. This addresses speaker confusion and reduces reliance on high-quality pre-recorded enrollment, especially in out-of-domain scenarios. Experiments on multiple benchmarks demonstrate consistent improvements over conventional TSE methods without requiring additional annotated data.

Key Contribution

Forget static enrollments: EvoTSE dynamically updates speaker profiles during target extraction, leading to better performance, especially in noisy, real-world conditions.

Abstract

Target Speaker Extraction (TSE) aims to isolate a specific speaker's voice from a mixture, guided by a pre-recorded enrollment. While TSE bypasses the global permutation ambiguity of blind source separation, it remains vulnerable to speaker confusion, where models mistakenly extract the interfering speaker. Furthermore, conventional TSE relies on static inference pipeline, where performance is limited by the quality of the fixed enrollment. To overcome these limitations, we propose EvoTSE, an evolving TSE framework in which the enrollment is continuously updated through reliability-filtered retrieval over high-confidence historical estimates. This mechanism reduces speaker confusion and relaxes the quality requirements for pre-recorded enrollment without relying on additional annotated data. Experiments across multiple benchmarks demonstrate that EvoTSE achieves consistent improvements, especially when evaluated on out-of-domain (OOD) scenarios. Our code and checkpoints are available.

Speech & Audio

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

EvoTSE: Evolving Enrollment for Target Speaker Extraction

Related Papers