Mar 4, 2026arXiv:2603.04296

FlowW2N: Whispered-to-Normal Speech Conversion via Flow-Matching

Fabian Ritter-Gutierrez, Md Asif Jalal, Pablo Peso Parada, Karthikeyan Saravanan, Yusun Shul, Minseung Kim, Gun-Woo Lee, Han-Gil Moon

AI Summary

The paper introduces FlowW2N, a whispered-to-normal speech conversion method using conditional flow matching trained on synthetic, time-aligned whisper-normal pairs. It leverages domain-invariant ASR embeddings to generalize to real whispers without requiring real paired data, addressing the temporal misalignment and data scarcity challenges in W2N conversion. FlowW2N achieves state-of-the-art intelligibility on CHAINS and wTIMIT datasets, significantly reducing Word Error Rate compared to previous methods.

Key Contribution

Achieve SOTA whispered-to-normal speech conversion by training exclusively on synthetic data, bridging the gap to real-world whispers with domain-invariant ASR embeddings.

Abstract

Whispered-to-normal (W2N) speech conversion aims to reconstruct missing phonation from whispered input while preserving content and speaker identity. This task is challenging due to temporal misalignment between whisper and voiced recordings and lack of paired data. We propose FlowW2N, a conditional flow matching approach that trains exclusively on synthetic, time-aligned whisper-normal pairs and conditions on domain-invariant features. We exploit high-level ASR embeddings that exhibits strong invariance between synthetic and real whispered speech, enabling generalization to real whispers despite never observing it during training. We verify this invariance across ASR layers and propose a selection criterion optimizing content informativeness and cross-domain invariance. Our method achieves SOTA intelligibility on the CHAINS and wTIMIT datasets, reducing Word Error Rate by 26-46% relative to prior work while using only 10 steps at inference and requiring no real paired data.

Data Curation & Synthetic Data Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References34

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

FlowW2N: Whispered-to-Normal Speech Conversion via Flow-Matching

Related Papers