PKUJun 11, 2026arXiv:2606.13024

CausalMoE: A Billion-Scale Multimodal Foundation Model for Granger Causal Discovery with Pattern-Routed Heterogeneous Experts

Bo Liu, Di Dai, Jingwei Liu, Jiarui Jin, Xiaocheng Fang, Guangkun Nie, Hongyan Li, Shenda Hong

AI Summary

This paper introduces CausalMoE, a billion-scale multimodal foundation model designed for Granger Causal Discovery (GCD) that addresses the limitations of existing neural GCD methods by employing a Pattern-Routed Mixture of Heterogeneous Experts. By dynamically routing data to specialized experts based on latent temporal patterns, CausalMoE effectively mitigates issues related to distribution shifts and regime changes, resulting in clearer causal graphs. Extensive experiments show that CausalMoE achieves state-of-the-art performance on fully supervised benchmarks and excels in few-shot scenarios where traditional approaches struggle.

Key Contribution

CausalMoE not only sets a new benchmark for Granger causal discovery but also excels in few-shot learning, revealing the power of heterogeneous expert routing in complex temporal analyses.

Abstract

Granger Causal Discovery (GCD) is fundamental for analyzing temporal dependencies in complex systems. However, existing neural GCD methods predominantly rely on a"one-size-fits-all"paradigm, struggling to capture distribution shifts and dynamic regime changes inherent in real-world time series. This often leads to entangled representations and spurious causal graphs. In this paper, we propose CausalMoE, a billion-scale multimodal Granger causal foundation model that explicitly models patch-level heterogeneity. CausalMoE introduces a Pattern-Routed Mixture of Heterogeneous Experts, which dynamically identifies latent temporal patterns and routes patches to specialized domain experts, effectively decoupling regime-specific mechanisms from shared dynamics. To ensure interpretable graph recovery, we design a Causality-Aware Self-Attention mechanism operating across variables, yielding sparse Granger causal graphs via proximal optimization. Furthermore, CausalMoE is the first to integrate LLMs and VLMs to align numerical signals with textual and visual priors, regularizing causal estimation in complex scenarios. Extensive experiments demonstrate that CausalMoE establishes a new state-of-the-art on fully supervised benchmarks, while effectively generalizing to few-shot settings where traditional methods fail.

Architecture Design (Transformers, SSMs, MoE)Multimodal Models

Citation Metrics

Citations0

Influential citations0

References40

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

CausalMoE: A Billion-Scale Multimodal Foundation Model for Granger Causal Discovery with Pattern-Routed Heterogeneous Experts

Related Papers