Feb 26, 2026arXiv:2602.22522

Efficient Dialect-Aware Modeling and Conditioning for Low-Resource Taiwanese Hakka Speech Processing

Antao Peng, An-Ci Peng, Kuan-Tang Huang, Kuan-Tang Huang, Tien-Hong Lo, Tien-Hong Lo, Hung-Shin Lee, Hung-Shin Lee, Berlin Chen, Berlin Chen

AI Summary

The paper introduces a dialect-aware RNN-T framework for low-resource Taiwanese Hakka ASR, explicitly modeling dialectal variations to disentangle "style" from linguistic "content." They employ parameter-efficient prediction networks to concurrently model Hanzi and Pinyin ASR, leveraging cross-script learning as a mutual regularizer. Experiments on the HAT corpus demonstrate a significant relative error rate reduction of 57.00% and 40.41% on Hanzi and Pinyin ASR, respectively.

Key Contribution

Achieve up to 57% relative error reduction in low-resource Taiwanese Hakka ASR by disentangling dialectal style from linguistic content using a dialect-aware RNN-T framework.

Abstract

Taiwanese Hakka is a low-resource, endangered language that poses significant challenges for automatic speech recognition (ASR), including high dialectal variability and the presence of two distinct writing systems (Hanzi and Pinyin). Traditional ASR models often encounter difficulties in this context, as they tend to conflate essential linguistic content with dialect-specific variations across both phonological and lexical dimensions. To address these challenges, we propose a unified framework grounded in the Recurrent Neural Network Transducers (RNN-T). Central to our approach is the introduction of dialect-aware modeling strategies designed to disentangle dialectal"style"from linguistic"content", which enhances the model's capacity to learn robust and generalized representations. Additionally, the framework employs parameter-efficient prediction networks to concurrently model ASR (Hanzi and Pinyin). We demonstrate that these tasks create a powerful synergy, wherein the cross-script objective serves as a mutual regularizer to improve the primary ASR tasks. Experiments conducted on the HAT corpus reveal that our model achieves 57.00% and 40.41% relative error rate reduction on Hanzi and Pinyin ASR, respectively. To our knowledge, this is the first systematic investigation into the impact of Hakka dialectal variations on ASR and the first single model capable of jointly addressing these tasks.

Data Curation & Synthetic Data Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References26

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Efficient Dialect-Aware Modeling and Conditioning for Low-Resource Taiwanese Hakka Speech Processing

Related Papers