FudanTongjiXiaohongshuδ University of CaliforniaMay 23, 2026arXiv:2605.24681

Mix-MoE: Improving Multilingual Machine Translation of Large Language Models through Mixed MoEs

AI Summary

The paper introduces Mix-MoE, a Mixture-of-Experts framework for fine-tuning LLMs for multilingual machine translation that mitigates parameter interference by separating language modeling and translation knowledge into distinct expert groups. They use a two-stage post-pretraining approach, first with MoE on monolingual data, then on parallel corpora, with specialized LM and MT experts. A Fourier Transform-enhanced routing mechanism facilitates interaction between experts.

Key Contribution

LLMs can learn multilingual translation far more effectively by explicitly separating and routing language modeling and translation knowledge during fine-tuning.

Abstract

Large Language Models (LLMs) have shown great promise in multilingual machine translation (MT), even with limited bilingual supervision. However, fine-tuning LLMs with parallel corpora presents major challenges, namely parameter interference. To address these issues, we propose Mix-MoE, a mixed Mixture-of-Experts framework designed to train LLMs for multilingual MT. Our framework operates in two distinct stages: (1) post-pretraining with MoE on monolingual corpora, and (2) post-pretraining with MoE on parallel corpora. Crucially, we divide the MoE layers into two specialized groups: Language Model Experts (LM Experts) and Machine Translation Experts (MT Experts). LM Experts are designed to capture and retain the monolingual knowledge learned by the pre-trained LLM. MT Experts, on the other hand, are specifically trained to acquire and store bilingual translation knowledge. Furthermore, to facilitate effective interaction between these specialized experts and leverage potential underlying structural patterns in text, we introduce a routing mechanism enhanced by Fourier Transform features derived from model representations. The experimental results demonstrate that Mix-MoE excels in multilingual MT, significantly outperforming existing baselines and showing notable progress in mitigating parameter interference.

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Mix-MoE: Improving Multilingual Machine Translation of Large Language Models through Mixed MoEs

Related Papers