NUSShenzhen Loop Area InstituteFeb 22, 2026arXiv:2602.19166

CosyAccent: Duration-Controllable Accent Normalization Using Source-Synthesis Training Data

Qibing Bai, Shuhao Shi, Yukai Ju, Yannan Wang, Haizhou Li

AI Summary

The paper introduces a "source-synthesis" training data methodology for accent normalization (AN) that generates source L2 speech and uses authentic native speech as the training target, avoiding reliance on real L2 data or TTS artifacts. They propose CosyAccent, a non-autoregressive model that balances prosodic naturalness with explicit duration control for accent normalization. Results demonstrate that CosyAccent, trained without real L2 speech, achieves better content preservation and naturalness than baselines trained on real-world data.

Key Contribution

Forget collecting real L2 speech data: this accent normalization method trains on synthetic L2 speech generated from text, achieving better content preservation and naturalness than models trained on real data.

Abstract

Accent normalization (AN) systems often struggle with unnatural outputs and undesired content distortion, stemming from both suboptimal training data and rigid duration modeling. In this paper, we propose a "source-synthesis" methodology for training data construction. By generating source L2 speech and using authentic native speech as the training target, our approach avoids learning from TTS artifacts and, crucially, requires no real L2 data in training. Alongside this data strategy, we introduce CosyAccent, a non-autoregressive model that resolves the trade-off between prosodic naturalness and duration control. CosyAccent implicitly models rhythm for flexibility yet offers explicit control over total output duration. Experiments show that, despite being trained without any real L2 speech, CosyAccent achieves significantly improved content preservation and superior naturalness compared to strong baselines trained on real-world data.

Data Curation & Synthetic Data Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

CosyAccent: Duration-Controllable Accent Normalization Using Source-Synthesis Training Data

Related Papers