Search papers, labs, and topics across Lattice.
The paper introduces the AppTek Call-Center Dialogues corpus, a new English ASR benchmark comprising spontaneous, role-played conversations across 14 English accents in service-oriented scenarios. This dataset addresses the limitations of existing corpora by providing long-form, unsegmented audio with explicit dialect annotations, crucial for evaluating ASR robustness across diverse user bases. Benchmarking open-source ASR systems reveals significant performance variations across accents and segmentation methods, highlighting the need for accent-aware ASR development.
General American English ASR performance doesn't guarantee similar accuracy across other English accents, as revealed by a new multi-accent call center dataset.
Evaluating English ASR systems for conversational AI applications remains difficult, as many publicly available corpora are either pre-segmented into short segments, consist of read or prepared speech, or lack explicit dialect annotations to evaluate robustness for a diverse user base. This work presents the AppTek Call-Center Dialogues corpus, a collection of spontaneous, role-played agent-customer conversations spanning fourteen English accents covering sixteen service-oriented scenarios. The dataset was commissioned specifically for evaluation and none of the audio or text was publicly available prior to release, reducing the risk of overlap with existing large-scale pretraining corpora. We benchmark a set of open-source ASR systems under different segmentation approaches. Results show substantial variation across accents and segmentation methods, indicating that good performance on general American English benchmarks does not necessarily generalize to other accents.