Mar 9, 2026arXiv:2603.13364

FineRMoE: Dimension Expansion for Finer-Grained Expert with Its Upcycling Approach

Ning Liao, Xiaoxing Wang, Xiaohan Qin, Junchi Yan

AI Summary

The paper introduces FineRMoE, a novel Mixture-of-Experts (MoE) architecture that extends fine-grained expert design to both intermediate and output dimensions to overcome performance limitations of single-dimension fine-grained MoEs. A bi-level sparse forward computation paradigm and specialized routing mechanism are introduced to manage expert activation. To reduce training costs, a generalized upcycling method is proposed to build FineRMoE efficiently.

Key Contribution

FineRMoE achieves 6x higher parameter efficiency, 281x lower prefill latency, and 136x higher decoding throughput compared to strong baselines, demonstrating a significant leap in MoE performance.

Abstract

As revealed by the scaling law of fine-grained MoE, model performance ceases to be improved once the granularity of the intermediate dimension exceeds the optimal threshold, limiting further gains from single-dimension fine-grained design. To address this bottleneck, we propose FineRMoE (FineR-Grained MoE), an architecture that extends fine-grained expert design to both intermediate and output dimensions, aiming to enhance expert specialization beyond the single-dimension limit. We further introduce a bi-level sparse forward computation paradigm and a specialized routing mechanism to govern the activation. In addition, to obviate the prohibitive cost of training FineRMoE from scratch, we devise a generalized upcycling method to build FineRMoE in a cost-effective manner. Extensive experiments demonstrate the superior performance achieved by FineRMoE across ten standard benchmarks. Compared with the strongest baseline, FineRMoE achieves 6 times higher parameter efficiency, 281 times lower prefill latency, and 136 timese higher decoding throughput during inference.

Architecture Design (Transformers, SSMs, MoE)Scaling Laws & Emergent Abilities Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

FineRMoE: Dimension Expansion for Finer-Grained Expert with Its Upcycling Approach

Related Papers