Mar 2, 2026arXiv:2603.02368

RO-N3WS: Enhancing Generalization in Low-Resource ASR with Diverse Romanian Speech Benchmarks

Alexandra Diaconu, Mădălina Vînaga, Bogdan Alexe

AI Summary

The paper introduces RO-N3WS, a 126-hour Romanian speech dataset spanning broadcast news, audiobooks, film dialogue, children's stories, and podcasts, to address the challenge of low-resource ASR generalization. They evaluated Whisper and Wav2Vec 2.0 in zero-shot and fine-tuned settings, also incorporating synthetic data generated via TTS. Results demonstrate that fine-tuning on RO-N3WS significantly improves WER compared to zero-shot performance, highlighting the dataset's value for domain adaptation.

Key Contribution

Romanian ASR gets a boost: a new diverse speech dataset slashes WER by fine-tuning existing models, even outperforming synthetic data.

Abstract

We introduce RO-N3WS, a benchmark Romanian speech dataset designed to improve generalization in automatic speech recognition (ASR), particularly in low-resource and out-of-distribution (OOD) conditions. RO-N3WS comprises over 126 hours of transcribed audio collected from broadcast news, literary audiobooks, film dialogue, children's stories, and conversational podcast speech. This diversity enables robust training and fine-tuning across stylistically distinct domains. We evaluate several state-of-the-art ASR systems (Whisper, Wav2Vec 2.0) in both zero-shot and fine-tuned settings, and conduct controlled comparisons using synthetic data generated with expressive TTS models. Our results show that even limited fine-tuning on real speech from RO-N3WS yields substantial WER improvements over zero-shot baselines. We will release all models, scripts, and data splits to support reproducible research in multilingual ASR, domain adaptation, and lightweight deployment.

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Speech & Audio

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

RO-N3WS: Enhancing Generalization in Low-Resource ASR with Diverse Romanian Speech Benchmarks

Related Papers