DTUMar 5, 2026arXiv:2603.05267

Beyond Word Error Rate: Auditing the Diversity Tax in Speech Recognition through Dataset Cartography

Ting-Hui Cheng, L. Clemmensen, Line H. Clemmensen, Sneha Das

AI Summary

This paper critiques the reliance on Word Error Rate (WER) for evaluating ASR systems, arguing it obscures biases against marginalized speakers. They introduce the Sample Difficulty Index (SDI) to quantify how demographic and acoustic factors contribute to model failure. Using data cartography with SDI and semantic metrics like EmbER and SemDist, they reveal systemic biases and inter-model disagreements missed by WER.

Key Contribution

Semantic metrics and data cartography expose hidden biases in ASR systems that WER alone fails to capture, revealing a "diversity tax" on marginalized speakers.

Abstract

Automatic speech recognition (ASR) systems are predominantly evaluated using the Word Error Rate (WER). However, raw token-level metrics fail to capture semantic fidelity and routinely obscures the `diversity tax', the disproportionate burden on marginalized and atypical speaker due to systematic recognition failures. In this paper, we explore the limitations of relying solely on lexical counts by systematically evaluating a broader class of non-linear and semantic metrics. To enable rigorous model auditing, we introduce the sample difficulty index (SDI), a novel metric that quantifies how intrinsic demographic and acoustic factors drive model failure. By mapping SDI on data cartography, we demonstrate that metrics EmbER and SemDist expose hidden systemic biases and inter-model disagreements that WER ignores. Finally, our findings are the first steps towards a robust audit framework for prospective safety analysis, empowering developers to audit and mitigate ASR disparities prior to deployment.

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Speech & Audio

Citation Metrics

Citations0

Influential citations0

References23

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Beyond Word Error Rate: Auditing the Diversity Tax in Speech Recognition through Dataset Cartography

Related Papers