Munich Center for Machine LearningApr 1, 2026arXiv:2604.00801

Routing-Free Mixture-of-Experts

Yilun Liu, Jinru Han, Sikuan Yan, Volker Tresp, Yunpu Ma

AI Summary

This paper introduces Routing-Free Mixture-of-Experts (MoE), which removes centralized routing mechanisms and instead allows each expert to determine its activation independently via continuous gradient flow. They introduce a unified adaptive load-balancing framework to optimize both expert and token balancing. Experiments demonstrate that Routing-Free MoE outperforms baselines with improved scalability and robustness.

Key Contribution

Ditch the router: this MoE architecture lets experts decide when to activate, leading to better scalability and robustness.

Abstract

Standard Mixture-of-Experts (MoE) models rely on centralized routing mechanisms that introduce rigid inductive biases. We propose Routing-Free MoE which eliminates any hard-coded centralized designs including external routers, Softmax, Top-K and load balancing, instead encapsulating all activation functionalities within individual experts and directly optimized through continuous gradient flow, enabling each expert to determine its activation entirely on its own. We introduce a unified adaptive load-balancing framework to simultaneously optimize both expert-balancing and token-balancing objectives through a configurable interpolation, allowing flexible and customizable resource allocation. Extensive experiments show that Routing-Free MoE can consistently outperform baselines with better scalability and robustness. We analyze its behavior in detail and offer insights that may facilitate future MoE design ad optimization.

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Routing-Free Mixture-of-Experts

Related Papers