IIT MadrasApr 21, 2026arXiv:2604.19151

Voice of India: A Large-Scale Benchmark for Real-World Speech Recognition in India

Kaushal Bhogale, Kaushal Bhogale, Manas Dhir, Manas Dhir, Amritansh Walecha, Amritansh Walecha, Manmeet Kaur, Manmeet Kaur, Vansh Chhabra, Vanshika Chhabra, Aaditya Pareek, Hanuman Sidh, Hanuman Sidh, Sagar Jain, Sagar Jain, Bhaskar Singh, Bhaskar Singh, Utkarsh Singh, Utkarsh Singh, Tahir Javed, Tahir Javed, Shobhit Banga, Mitesh M. Khapra

AI Summary

The paper introduces Voice of India, a new closed-source ASR benchmark comprising 536 hours of unscripted telephonic speech across 15 major Indian languages and 139 regional clusters. The dataset addresses limitations of existing Indic ASR benchmarks by using real-world conversational speech and accounting for spelling variations common in Indian languages. Analysis of ASR performance reveals significant geographic disparities and sheds light on the impact of factors like audio quality and speaking rate on ASR accuracy.

Key Contribution

Current ASR systems stumble significantly when faced with the nuances of real-world Indian speech, as revealed by a new benchmark exposing geographic performance disparities and the impact of audio quality, speaking rate, and device type.

Abstract

Existing Indic ASR benchmarks often use scripted, clean speech and leaderboard driven evaluation that encourages dataset specific overfitting. In addition, strict single reference WER penalizes natural spelling variation in Indian languages, including non standardized spellings of code-mixed English origin words. To address these limitations, we introduce Voice of India, a closed source benchmark built from unscripted telephonic conversations covering 15 major Indian languages across 139 regional clusters. The dataset contains 306230 utterances, totaling 536 hours of speech from 36691 speakers with transcripts accounting for spelling variations. We also analyze performance geographically at the district level, revealing disparities. Finally, we provide detailed analysis across factors such as audio quality, speaking rate, gender, and device type, highlighting where current ASR systems struggle and offering insights for improving real world Indic ASR systems.

Eval Frameworks & Benchmarks Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References32

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Voice of India: A Large-Scale Benchmark for Real-World Speech Recognition in India

Related Papers