Search papers, labs, and topics across Lattice.
The authors introduce BlasBench, an open benchmark and evaluation harness for Irish Automatic Speech Recognition (ASR) systems, featuring Irish-aware text normalization. They benchmark 12 ASR systems across four architecture families on Common Voice ga-IE and FLEURS ga-IE datasets, revealing significant performance disparities. The finding that models fine-tuned on Common Voice suffer a 33-43 WER point increase on FLEURS highlights a critical generalization gap often masked by single-dataset evaluations.
Fine-tuning ASR models on Common Voice can create a false sense of security, with performance on the FLEURS dataset dropping by a staggering 33-43 WER points.
No open Irish-specific benchmark compares end-user ASR systems under a shared Irish-aware evaluation protocol. To solve this, we release BlasBench, an open evaluation harness with Irish-aware text normalisation that preserves fadas, lenition, and eclipsis. We benchmark 12 systems across four architecture families on Common Voice ga-IE and FLEURS ga-IE. All Whisper variants exceed 100% WER. The best open model (omniASR LLM 7B) achieves 30.65% WER on Common Voice and 39.09% on FLEURS. We noticed models fine-tuned on Common Voice lose 33-43 WER points on FLEURS, revealing a generalisation gap that is invisible to single-dataset evaluation.