Apr 20, 2026arXiv:2604.18842

Multi-Domain Learning with Global Expert Mapping

Pourya Shamsolmoali, Pourya Shamsolmoali, Masoumeh Zareapoor, Masoumeh Zareapoor, Huiyu Zhou, Oscar Mendez, Dacheng Tao, Xuelong Li

AI Summary

This paper introduces Global Expert Mapping (GEM), a planner-compiler framework for multi-domain learning that replaces the learned router in Mixture-of-Experts (MoE) models with a global scheduler based on linear programming. GEM optimizes for domain-aware routing by computing a fractional assignment of datasets to experts, then uses hierarchical rounding to create a deterministic mapping, avoiding the balancing loss that hinders specialization in standard MoEs. Experiments on UODB demonstrate that GEM-DINO achieves state-of-the-art performance, particularly on underrepresented datasets and in few-shot adaptation.

Key Contribution

Ditch the learned router: a global scheduler for Mixture-of-Experts models unlocks state-of-the-art multi-domain learning by explicitly optimizing dataset-to-expert assignments.

Abstract

Human perception generalizes well across different domains, but most vision models struggle beyond their training data. This gap motivates multi-dataset learning, where a single model is trained on diverse datasets to improve robustness under domain shifts. However, unified training remains challenging due to inconsistencies in data distributions and label semantics. Mixture-of-Experts (MoE) models provide a scalable solution by routing inputs to specialized subnetworks (experts). Yet, existing MoEs often fail to specialize effectively, as their load-balancing mechanisms enforce uniform input distribution across experts. This fairness conflicts with domain-aware routing, causing experts to learn redundant representations, and reducing performance especially on rare or out-of-distribution domains. We propose GEM (Global Expert Mapping), a planner-compiler framework that replaces the learned router with a global scheduler. Our planner, based on linear programming relaxation, computes a fractional assignment of datasets to experts, while the compiler applies hierarchical rounding to convert this soft plan into a deterministic, capacity-aware mapping. Unlike prior MoEs, GEM avoids balancing loss, resolves the conflict between fairness and specialization, and produces interpretable routing. Experiments show that GEM-DINO achieves state-of-the-art performance on the UODB benchmark, with notable gains on underrepresented datasets and solves task interference in few-shot adaptation scenarios.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References65

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Multi-Domain Learning with Global Expert Mapping

Related Papers