Search papers, labs, and topics across Lattice.
The paper introduces Aladdin-FTI, a system designed for dialectal Arabic (DA) generation and translation, addressing the under-representation of Arabic dialects in NLP. The system leverages recent advances in Large Language Models to model Arabic as a pluricentric language. Aladdin-FTI supports text generation in five dialects (Moroccan, Egyptian, Palestinian, Syrian, and Saudi) and bidirectional translation between these dialects, MSA, and English.
Aladdin-FTI tackles the under-representation of Arabic dialects in NLP by offering a model for both generation and translation across multiple dialects, MSA, and English.
Arabic dialects have long been under-represented in Natural Language Processing (NLP) research due to their non-standardization and high variability, which pose challenges for computational modeling. Recent advances in the field, such as Large Language Models (LLMs), offer promising avenues to address this gap by enabling Arabic to be modeled as a pluricentric language rather than a monolithic system. This paper presents Aladdin-FTI, our submission to the AMIYA shared task. The proposed system is designed to both generate and translate dialectal Arabic (DA). Specifically, the model supports text generation in Moroccan, Egyptian, Palestinian, Syrian, and Saudi dialects, as well as bidirectional translation between these dialects, Modern Standard Arabic (MSA), and English. The code and trained model are publicly available.