Mar 2, 2026arXiv:2603.01467

Conversational Speech Naturalness Predictor

Anfeng Xu, Yashesh Gaur, Yashesh Gaur, Naoyuki Kanda, Naoyuki Kanda, Zhicheng Ouyang, Zhicheng Ouyang, Katerina Zmolikova, Kateřina Žmolíková, Desh Raj, Desh Raj, Simone Merello, Simone Merello, Anna Sun, Anna Y. Sun, Ozlem Kalinli, Ozlem Kalinli

AI Summary

This paper addresses the problem of evaluating conversational naturalness in multi-turn, two-speaker dialogues, which is not well-addressed by existing single-speaker naturalness predictors. The authors demonstrate that existing naturalness estimators correlate poorly with human judgments of conversational naturalness. They then introduce a dual-channel naturalness estimator leveraging pre-trained encoders and data augmentation, achieving significantly improved correlation with human ratings in both in-domain and out-of-domain settings.

Key Contribution

Existing speech naturalness predictors fall flat when judging multi-turn conversations, but a new dual-channel estimator closes the gap with human perception.

Abstract

Evaluation of conversational naturalness is essential for developing human-like speech agents. However, existing speech naturalness predictors are often designed to assess utterances from a single speaker, failing to capture conversation-level naturalness qualities. In this paper, we present a framework for an automatic naturalness predictor for two-speaker, multi-turn conversations. We first show that existing naturalness estimators have low, or sometimes even negative, correlations with conversational naturalness, based on conversational recordings annotated with human ratings. We then propose a dual-channel naturalness estimator, in which we investigate multiple pre-trained encoders with data augmentation. Our proposed model achieves substantially higher correlation with human judgments compared to existing naturalness predictors for both in-domain and out-of-domain conditions.

Eval Frameworks & Benchmarks Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References25

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Conversational Speech Naturalness Predictor

Related Papers