Mar 15, 2026arXiv:2603.14328

CodecMOS-Accent: A MOS Benchmark of Resynthesized and TTS Speech from Neural Codecs Across English Accents

Wen-Chin Huang, Nicholas Sanders, Erica Cooper

AI Summary

The paper introduces CodecMOS-Accent, a new MOS benchmark for evaluating neural audio codecs and LLM-based TTS models, with a focus on accented speech. The dataset includes 4,000 resynthesis and TTS samples from 24 systems, covering 32 speakers across ten accents, and is annotated with 19,600 subjective ratings for naturalness, speaker similarity, and accent similarity. Analysis of the benchmark reveals relationships between speaker and accent similarity, the predictive power of objective metrics, and accent-based perceptual biases in listeners.

Key Contribution

Accented speech reveals perceptual biases in speech synthesis evaluation: listeners rate speakers with matching accents as more natural.

Abstract

We present the CodecMOS-Accent dataset, a mean opinion score (MOS) benchmark designed to evaluate neural audio codec (NAC) models and the large language model (LLM)-based text-to-speech (TTS) models trained upon them, especially across non-standard speech like accented speech. The dataset comprises 4,000 codec resynthesis and TTS samples from 24 systems, featuring 32 speakers spanning ten accents. A large-scale subjective test was conducted to collect 19,600 annotations from 25 listeners across three dimensions: naturalness, speaker similarity, and accent similarity. This dataset does not only represent an up-to-date study of recent speech synthesis system performance but reveals insights including a tight relationship between speaker and accent similarity, the predictive power of objective metrics, and a perceptual bias when listeners share the same accent with the speaker. This dataset is expected to foster research on more human-centric evaluation for NAC and accented TTS.

Eval Frameworks & Benchmarks Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

CodecMOS-Accent: A MOS Benchmark of Resynthesized and TTS Speech from Neural Codecs Across English Accents

Related Papers