Apr 21, 2026arXiv:2604.18932

Tadabur: A Large-Scale Quran Audio Dataset

AI Summary

The paper introduces Tadabur, a new large-scale Quran audio dataset designed to overcome limitations in existing resources for Quranic data research. Tadabur contains over 1400 hours of recitation audio from 600+ reciters, offering significant diversity in recitation styles and recording conditions. The dataset aims to facilitate the development of standardized benchmarks and support future research in Quranic speech analysis.

Key Contribution

Finally, a dataset large and diverse enough to train robust models for Quranic speech research.

Abstract

Despite growing interest in Quranic data research, existing Quran datasets remain limited in both scale and diversity. To address this gap, we present Tadabur, a large-scale Quran audio dataset. Tadabur comprises more than 1400+ hours of recitation audio from over 600 distinct reciters, providing substantial variation in recitation styles, vocal characteristics, and recording conditions. This diversity makes Tadabur a comprehensive and representative resource for Quranic speech research and analysis. By significantly expanding both the total duration and variability of available Quran data, Tadabur aims to support future research and facilitate the development of standardized Quranic speech benchmarks.

Data Curation & Synthetic Data Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References16

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Tadabur: A Large-Scale Quran Audio Dataset

Related Papers