CMU MLFeb 17, 2026arXiv:2602.15519

Enroll-on-Wakeup: A First Comparative Study of Target Speech Extraction for Seamless Interaction in Real Noisy Human-Machine Dialogue Scenarios

Yiming Yang, Guangyong Wang, Haixin Guan, Yanhua Long

AI Summary

The paper introduces Enroll-on-Wakeup (EoW), a framework that leverages wake-word segments as enrollment references for target speech extraction (TSE), removing the need for pre-recorded enrollment speech. They conduct a comparative study of discriminative and generative TSE models under real-world noisy conditions within the EoW framework. Their results demonstrate that while EoW-TSE presents challenges for existing models due to the short and noisy enrollment segments, LLM-based TTS enrollment augmentation can significantly improve the listening experience.

Key Contribution

Ditching pre-recorded enrollment speech, this work shows how wake words can bootstrap target speech extraction, paving the way for more natural human-machine dialogues.

Abstract

Target speech extraction (TSE) typically relies on pre-recorded high-quality enrollment speech, which disrupts user experience and limits feasibility in spontaneous interaction. In this paper, we propose Enroll-on-Wakeup (EoW), a novel framework where the wake-word segment, captured naturally during human-machine interaction, is automatically utilized as the enrollment reference. This eliminates the need for pre-collected speech to enable a seamless experience. We perform the first systematic study of EoW-TSE, evaluating advanced discriminative and generative models under real diverse acoustic conditions. Given the short and noisy nature of wake-word segments, we investigate enrollment augmentation using LLM-based TTS. Results show that while current TSE models face performance degradation in EoW-TSE, TTS-based assistance significantly enhances the listening experience, though gaps remain in speech recognition accuracy.

Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Enroll-on-Wakeup: A First Comparative Study of Target Speech Extraction for Seamless Interaction in Real Noisy Human-Machine Dialogue Scenarios

Related Papers