Search papers, labs, and topics across Lattice.
The authors introduce MERaLiON-AudioLLM, a large language model trained on 62 million multimodal instruction samples (260k hours of audio) to understand Singlish and perform diverse audio-based tasks. This model addresses the gap in region-specific AI capable of understanding colloquial and code-switched language. MERaLiON-AudioLLM demonstrates competitive performance in ASR, spoken question answering, speech translation, and paralinguistic analysis, particularly excelling in local speech recognition compared to existing open-source models.
Singlish, the code-switched English dialect of Singapore, now has a dedicated audio-based LLM that outperforms existing models on local speech recognition tasks.
We introduce MERaLiON-AudioLLM, the first general-purpose multitask audio-based large language model designed to understand Singlish, a colloquial and code-switched variety of English spoken in Singapore. Trained on 62 million multimodal instruction samples spanning over 260,000 hours of audio, MERaLiON-AudioLLM exhibits strong performance across diverse tasks including automatic speech recognition, spoken question answering, speech translation, and paralinguistic analysis. We benchmark MERaLiON-AudioLLM across a broad range of multilingual and multi-task scenarios, and it demonstrates competitive performance against existing open-source models. The model achieves significant gains in local speech recognition and task-specific understanding, underscoring its utility for region-specific AI applications. We develop an interactive demo interface to enable user-friendly access, supported by a back-end with custom caching and load-balancing mechanisms. The interactive demos, model weights and video are publicly available for both the first release of MERaLiON-AudioLLM 1 and the recent second release of MERaLiON-2 2 . This paper focuses exclusively on the development and evaluation of the first release.