USCMar 8, 2026arXiv:2603.07550

Learning-free L2-Accented Speech Generation using Phonological Rules

Yoonjeong Lee, Jihwan Lee, Tiantian Feng

AI Summary

This paper introduces a learning-free accented text-to-speech (TTS) framework that leverages phonological rules applied to phoneme sequences in conjunction with a multilingual TTS model. The method transforms accent at the phoneme level without requiring accented training data, enabling fine-grained control over accent while preserving intelligibility. Rule sets were designed for Spanish- and Indian-accented English, modeling phonological differences in consonants, vowels, and syllable structure.

Key Contribution

Achieve accent-specific speech synthesis without any accented training data by cleverly combining phonological rules with multilingual TTS.

Abstract

Accent plays a crucial role in speaker identity and inclusivity in speech technologies. Existing accented text-to-speech (TTS) systems either require large-scale accented datasets or lack fine-grained phoneme-level controllability. We propose a accented TTS framework that combines phonological rules with a multilingual TTS model. The rules are applied to phoneme sequences to transform accent at the phoneme level while preserving intelligibility. The method requires no accented training data and enables explicit phoneme-level accent manipulation. We design rule sets for Spanish- and Indian-accented English, modeling systematic differences in consonants, vowels, and syllable structure arising from phonotactic constraints. We analyze the trade-off between phoneme-level duration alignment and accent as realized in speech timing. Experimental results demonstrate effective accent shift while maintaining speech quality.

Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Learning-free L2-Accented Speech Generation using Phonological Rules

Related Papers