Mar 2, 2026arXiv:2603.02364

When Spoof Detectors Travel: Evaluation Across 66 Languages in the Low-Resource Language Spoofing Corpus

Kirill Borodin, Vasiliy Kudryavtsev, Maxim Maslov, Mikhail Gorodnichev, Grach Mkrtchian

AI Summary

The paper introduces LRLspoof, a large-scale multilingual corpus containing 2,732 hours of synthetic speech across 66 languages generated using 24 open-source TTS systems, with a focus on low-resource languages. It benchmarks the cross-lingual performance of 11 publicly available spoof detection countermeasures using a threshold transfer approach, calibrating EER thresholds on external benchmarks and evaluating spoof rejection rate (SRR) on LRLspoof. The key finding is that significant language-dependent disparities exist in cross-lingual spoof detection, indicating that language itself contributes to domain shift.

Key Contribution

Language is a major, underappreciated domain shift factor in spoof detection, with performance varying wildly across languages even under controlled conditions.

Abstract

We introduce LRLspoof, a large-scale multilingual synthetic-speech corpus for cross-lingual spoof detection, comprising 2,732 hours of audio generated with 24 open-source TTS systems across 66 languages, including 45 low-resource languages under our operational definition. To evaluate robustness without requiring target-domain bonafide speech, we benchmark 11 publicly available countermeasures using threshold transfer: for each model we calibrate an EER operating point on pooled external benchmarks and apply the resulting threshold, reporting spoof rejection rate (SRR). Results show model-dependent cross-lingual disparity, with spoof rejection varying markedly across languages even under controlled conditions, highlighting language as an independent source of domain shift in spoof detection. The dataset is publicly available at \href{https://huggingface.co/datasets/MTUCI/LRLspoof}{\textbf{\underline{\textit{HuggingFace}}}} and \href{https://modelscope.cn/datasets/lab260/LRLspoof}{\textbf{\underline{\textit{ModelScope}}}}

Data Curation & Synthetic Data Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

When Spoof Detectors Travel: Evaluation Across 66 Languages in the Low-Resource Language Spoofing Corpus

Related Papers