Jiaotong-Liverpool UniversityXi'an Jiaotong-Liverpool UniversityApr 6, 2026arXiv:2604.04839

MERIT: Multilingual Expert-Reward Informed Tuning for Chinese-Centric Low-Resource Machine Translation

Zhixiang Lu, Chong Zhang, Angelos Stefanidis, Chong Li, Jionglong Su, Zhengyong Jiang

AI Summary

The paper introduces MERIT, a new framework for Chinese-to-low-resource Southeast Asian language translation that addresses the scarcity of clean parallel corpora. MERIT combines language-specific token prefixing, supervised fine-tuning, and a novel group relative policy optimization guided by a semantic alignment reward. Experiments on five languages demonstrate that MERIT significantly outperforms existing methods, showing the importance of targeted data curation and reward-guided optimization over simply scaling model size.

Key Contribution

Forget brute-force scaling: smart data curation and reward-guided optimization can dramatically boost Chinese-to-Southeast Asian language translation, leaving larger models in the dust.

Abstract

Neural machine translation (NMT) from Chinese to low-resource Southeast Asian languages remains severely constrained by the extreme scarcity of clean parallel corpora and the pervasive noise in existing mined data. This chronic shortage not only impedes effective model training but also sustains a large performance gap with high-resource directions, leaving millions of speakers of languages such as Lao, Burmese, and Tagalog with persistently low-quality translation systems despite recent advances in large multilingual models. We introduce \textbf{M}ultilingual \textbf{E}xpert-\textbf{R}eward \textbf{I}nformed \textbf{T}uning (\textbf{MERIT}), a unified translation framework that transforms the traditional English-centric ALT benchmark into a Chinese-centric evaluation suite for five Southeast Asian low-resource languages (LRLs). Our framework combines language-specific token prefixing (LTP) with supervised fine-tuning (SFT) and a novel group relative policy optimization (GRPO) guided by the semantic alignment reward (SAR). These results confirm that, in LRL{\textrightarrow}Chinese translation, targeted data curation and reward-guided optimization dramatically outperform mere model scaling.

Data Curation & Synthetic Data Natural Language Processing Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

MERIT: Multilingual Expert-Reward Informed Tuning for Chinese-Centric Low-Resource Machine Translation

Related Papers