Mar 11, 2026arXiv:2603.11378

Continued Pretraining for Low-Resource Swahili ASR: Achieving State-of-the-Art Performance with Minimal Labeled Data

AI Summary

This paper explores continued pretraining (CPT) of wav2vec2-bert-2.0 for low-resource Swahili ASR by leveraging unlabeled audio and a small amount of labeled data via pseudo-labeling. The CPT approach is followed by supervised fine-tuning, significantly improving performance. The method achieves a 3.24% WER on Common Voice Swahili using only 20,000 labeled samples, outperforming previous state-of-the-art systems by a large margin.

Key Contribution

You can slash ASR error rates in low-resource languages by over 60% with a simple continued pretraining recipe.

Abstract

We investigate continued pretraining (CPT) for adapting wav2vec2-bert-2.0 to Swahili automatic speech recognition (ASR). Our approach combines unlabeled audio with limited labeled data through pseudo-labeled CPT followed by supervised finetuning. With 20,000 labeled samples, we achieve 3.24% WER on Common Voice Swahili-an 82% relative improvement over the baseline. This result surpasses the best previously reported academic system (8.3% WER from XLS-R) by 61% relative improvement. We provide concrete data requirements and a replicable methodology applicable to other low-resource languages.

Natural Language Processing Speech & Audio Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References12

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Continued Pretraining for Low-Resource Swahili ASR: Achieving State-of-the-Art Performance with Minimal Labeled Data

Related Papers