Mar 31, 2026arXiv:2603.29892

FLEURS-Kobani: Extending the FLEURS Dataset for Northern Kurdish

AI Summary

The authors introduce FLEURS-Kobani, a Northern Kurdish extension to the FLEURS dataset, comprising 5,162 validated utterances (18+ hours) recorded by 31 native speakers. They establish ASR and end-to-end speech-to-text translation (S2TT) baselines using fine-tuned Whisper v3-large models, achieving a WER of 28.11 and BLEU score of 8.68, respectively. This work provides a crucial benchmark for evaluating speech technologies in an under-resourced Kurdish variety.

Key Contribution

Northern Kurdish finally gets its due with FLEURS-Kobani, a new benchmark dataset that exposes the challenges and opportunities for ASR and speech translation in this under-resourced language.

Abstract

FLEURS offers n-way parallel speech for 100+ languages, but Northern Kurdish is not one of them, which limits benchmarking for automatic speech recognition and speech translation tasks in this language. We present FLEURS-Kobani, a Northern Kurdish (ISO 639-3 KMR) spoken extension of the FLEURS benchmark. The FLEURS-Kobani dataset consists of 5,162 validated utterances, totaling 18 hours and 24 minutes. The data were recorded by 31 native speakers. It extends benchmark coverage to an under-resourced Kurdish variety. As baselines, we fine-tuned Whisper v3-large for ASR and E2E S2TT. A two-stage fine-tuning strategy (Common Voice to FLEURS-Kobani) yields the best ASR performance (WER 28.11, CER 9.84 on test). For E2E S2TT (KMR to EN), Whisper achieves 8.68 BLEU on test; we additionally report pivot-derived targets and a cascaded S2TT setup. FLEURS-Kobani provides the first public Northern Kurdish benchmark for evaluation of ASR, S2TT and S2ST tasks. The dataset is publicly released for research use under a CC BY 4.0 license.

Eval Frameworks & Benchmarks Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

FLEURS-Kobani: Extending the FLEURS Dataset for Northern Kurdish

Related Papers