BaiduMiroMind AIShanghai AI LabAug 4, 2025

MERaLiON-AudioLLM: Advancing Speech and Language Understanding for Singapore

Yingxu He, Zhuohan Liu, Geyu Lin, Shuo Sun, Bin Wang, Wenyu Zhang, Xunlong Zou, Nancy F. Chen, AiTi Aw

AI Summary

The authors introduce MERaLiON-AudioLLM, a large language model trained on 62 million multimodal instruction samples (260k hours of audio) to understand Singlish and perform diverse audio-based tasks. This model addresses the gap in region-specific AI capable of understanding colloquial and code-switched language. MERaLiON-AudioLLM demonstrates competitive performance in ASR, spoken question answering, speech translation, and paralinguistic analysis, particularly excelling in local speech recognition compared to existing open-source models.

Key Contribution

Singlish, the code-switched English dialect of Singapore, now has a dedicated audio-based LLM that outperforms existing models on local speech recognition tasks.

Abstract

We introduce MERaLiON-AudioLLM, the first general-purpose multitask audio-based large language model designed to understand Singlish, a colloquial and code-switched variety of English spoken in Singapore. Trained on 62 million multimodal instruction samples spanning over 260,000 hours of audio, MERaLiON-AudioLLM exhibits strong performance across diverse tasks including automatic speech recognition, spoken question answering, speech translation, and paralinguistic analysis. We benchmark MERaLiON-AudioLLM across a broad range of multilingual and multi-task scenarios, and it demonstrates competitive performance against existing open-source models. The model achieves significant gains in local speech recognition and task-specific understanding, underscoring its utility for region-specific AI applications. We develop an interactive demo interface to enable user-friendly access, supported by a back-end with custom caching and load-balancing mechanisms. The interactive demos, model weights and video are publicly available for both the first release of MERaLiON-AudioLLM 1 and the recent second release of MERaLiON-2 2 . This paper focuses exclusively on the development and evaluation of the first release.

Multimodal Models Natural Language Processing Speech & Audio

Citation Metrics

Citations3

Influential citations2

References22

Year2025

VenueAnnual Meeting of the Association for Computational Linguistics

Related Papers

Finding related papers...

Search

MERaLiON-AudioLLM: Advancing Speech and Language Understanding for Singapore

Related Papers