Apr 23, 2026arXiv:2604.21481

Preferences of a Voice-First Nation: Large-Scale Pairwise Evaluation and Preference Analysis for TTS in Indian Languages

Srija Anand, Ashwin Sankar, Ishvinder Sethi, I. Sethi, Aaditya Pareek, Kartik Rajput, Gaurav Yadav, N. Narasimhan, Nikhil Narasimhan, Adish Pandya, Deepon Halder, Mohammed Safi Ur, Praveen S V, Rahman Khan, Shobhit Banga, Mitesh M. Khapra

AI Summary

This paper introduces a controlled pairwise evaluation framework for multilingual TTS across 10 Indic languages, addressing the high variance inherent in speech perception. They collected over 120K pairwise comparisons from native raters, evaluating 7 state-of-the-art TTS systems across six perceptual dimensions. The study uses Bradley-Terry modeling to generate a multilingual leaderboard and SHAP analysis to interpret human preferences, revealing model strengths and trade-offs.

Key Contribution

Forget English – this study reveals which TTS systems truly resonate with native speakers across ten diverse Indian languages, pinpointing specific perceptual dimensions that drive preference.

Abstract

Crowdsourced pairwise evaluation has emerged as a scalable approach for assessing foundation models. However, applying it to Text to Speech(TTS) introduces high variance due to linguistic diversity and multidimensional nature of speech perception. We present a controlled multidimensional pairwise evaluation framework for multilingual TTS that combines linguistic control with perceptually grounded annotation. Using 5K+ native and code-mixed sentences across 10 Indic languages, we evaluate 7 state-of-the-art TTS systems and collect over 120K pairwise comparisons from over 1900 native raters. In addition to overall preference, raters provide judgments across 6 perceptual dimensions: intelligibility, expressiveness, voice quality, liveliness, noise, and hallucinations. Using Bradley-Terry modeling, we construct a multilingual leaderboard, interpret human preference using SHAP analysis and analyze leaderboard reliability alongside model strengths and trade-offs across perceptual dimensions.

Eval Frameworks & Benchmarks Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References34

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Preferences of a Voice-First Nation: Large-Scale Pairwise Evaluation and Preference Analysis for TTS in Indian Languages

Related Papers