Microsoft ResearchBeihangFeb 26, 2026arXiv:2602.22938

pMoE: Prompting Diverse Experts Together Wins More in Visual Adaptation

Shentong Mo, Shentong Mo, Xufang Luo, Xufang Luo, Dongsheng Li

AI Summary

The paper introduces pMoE, a parameter-efficient fine-tuning method for visual adaptation that leverages a Mixture-of-Experts architecture with expert-specialized prompt tokens and a learnable dispatcher. This approach addresses the limitation of single pre-trained model prompt tuning by integrating diverse domain knowledge during the adaptation process. Experiments across 47 classification and segmentation tasks demonstrate that pMoE achieves superior performance and an improved trade-off between computational efficiency and adaptation effectiveness compared to existing methods.

Key Contribution

Forget monolithic models: pMoE shows that ensembling diverse expert prompts within a single model framework yields surprisingly large gains in visual adaptation across a wide range of tasks.

Abstract

Parameter-efficient fine-tuning has demonstrated promising results across various visual adaptation tasks, such as classification and segmentation. Typically, prompt tuning techniques have harnessed knowledge from a single pre-trained model, whether from a general or a specialized medical domain. However, this approach typically overlooks the potential synergies that could arise from integrating diverse domain knowledge within the same tuning process. In this work, we propose a novel Mixture-of-Experts prompt tuning method called pMoE, which leverages the strengths of multiple expert domains through expert-specialized prompt tokens and the learnable dispatcher, effectively combining their expertise in a unified model framework. Our pMoE introduces expert-specific prompt tokens and utilizes a dynamic token dispatching mechanism at various prompt layers to optimize the contribution of each domain expert during the adaptation phase. By incorporating both domain knowledge from diverse experts, the proposed pMoE significantly enhances the model's versatility and applicability to a broad spectrum of tasks. We conduct extensive experiments across 47 adaptation tasks, including both classification and segmentation in general and medical domains. The results demonstrate that our pMoE not only achieves superior performance with a large margin of improvements but also offers an optimal trade-off between computational efficiency and adaptation effectiveness compared to existing methods.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References60

Year2026

VenueInternational Conference on Learning Representations

Related Papers

Finding related papers...

Search

pMoE: Prompting Diverse Experts Together Wins More in Visual Adaptation

Related Papers