Mar 16, 2026arXiv:2603.15352

NV-Bench: Benchmark of Nonverbal Vocalization Synthesis for Expressive Text-to-Speech Generation

Qinke Ni, Huan Liao, Dekun Chen, Yuxiang Wang, Zhizheng Wu

AI Summary

NV-Bench is introduced as the first benchmark for evaluating nonverbal vocalization (NV) synthesis in TTS systems, addressing the lack of standardized metrics and ground-truth references. It comprises 1,651 multilingual utterances across 14 NV categories, paired with human reference audio. The benchmark employs a dual-dimensional evaluation protocol, assessing instruction alignment using paralinguistic character error rate (PCER) and acoustic fidelity by measuring the distributional gap to real recordings.

Key Contribution

Finally, a benchmark to rigorously test how well TTS models can synthesize nonverbal vocalizations like laughter, sighs, and gasps, moving beyond simple acoustic metrics to assess communicative function.

Abstract

While recent text-to-speech (TTS) systems increasingly integrate nonverbal vocalizations (NVs), their evaluations lack standardized metrics and reliable ground-truth references. To bridge this gap, we propose NV-Bench, the first benchmark grounded in a functional taxonomy that treats NVs as communicative acts rather than acoustic artifacts. NV-Bench comprises 1,651 multi-lingual, in-the-wild utterances with paired human reference audio, balanced across 14 NV categories. We introduce a dual-dimensional evaluation protocol: (1) Instruction Alignment, utilizing the proposed paralinguistic character error rate (PCER) to assess controllability, (2) Acoustic Fidelity, measuring the distributional gap to real recordings to assess acoustic realism. We evaluate diverse TTS models and develop two baselines. Experimental results demonstrate a strong correlation between our objective metrics and human perception, establishing NV-Bench as a standardized evaluation framework.

Eval Frameworks & Benchmarks Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

NV-Bench: Benchmark of Nonverbal Vocalization Synthesis for Expressive Text-to-Speech Generation

Related Papers