Tsinghua AIBUPTHKUSTZhipuMay 25, 2026arXiv:2605.25565

RotMoLE: Enhancing Mixture of Low-Rank Experts through Rotational Gating Mechanism

Mengyang Sun, Maochuan Dou, Yihao Wang, Junpeng Liu, Yifan Zhu, Jie Tang

AI Summary

This paper introduces RotMoLE, a Mixture-of-Experts (MoE) framework tailored for low-rank adapters, which incorporates a rotational gating mechanism to enhance expert specialization. Unlike conventional MoE gating that uses scalar reweighing, RotMoLE applies a rotation to each selected expert's output. Experiments on multi-task and multilingual training demonstrate that RotMoLE achieves improved performance by better exploiting the capacity of limited expert candidates.

Key Contribution

RotMoLE's rotational gating unlocks more representational power from low-rank MoE architectures, even when expert diversity is limited.

Abstract

While Large Language Models (LLMs) are commonly fine-tuned to handle domain-specific tasks before being applied to vertical applications, adapting them to complex scenarios with diverse specialized knowledge remains challenging. Meanwhile, Mixture-of-Experts (MoE) architecture has risen as a crucial paradigm for training LLMs, and some recent works have also incorporated MoE into Parameter-Efficient Fine-Tuning (PEFT) to propose the Mixture of Low-rank Experts (MoE-LoRA), to enhance the power of low-rank adapters for learning complicated knowledge. However, conventional gating mechanisms in MoE typically apply only a scalar reweighing to selected experts, thereby limiting their underlying capacity of representation and generalization. Motivated and enabled by the low-rank structures in MoE-LoRA, we propose RotMoLE, a specialized MoE framework for low-rank experts featuring an additional rotation gate. Beyond simple scaling, RotMoLE implements a rotation mechanism for each selected expert, enabling superior expert exploitation and specialization for learning diverse data, especially when expert candidates are limited. Empirical results on complex multi-task and multilingual training scenarios validate our effectiveness.

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

RotMoLE: Enhancing Mixture of Low-Rank Experts through Rotational Gating Mechanism

Related Papers