Search papers, labs, and topics across Lattice.
This paper introduces CausalMoE, a billion-scale multimodal foundation model designed for Granger Causal Discovery (GCD) that addresses the limitations of existing neural GCD methods by employing a Pattern-Routed Mixture of Heterogeneous Experts. By dynamically routing data to specialized experts based on latent temporal patterns, CausalMoE effectively mitigates issues related to distribution shifts and regime changes, resulting in clearer causal graphs. Extensive experiments show that CausalMoE achieves state-of-the-art performance on fully supervised benchmarks and excels in few-shot scenarios where traditional approaches struggle.
CausalMoE not only sets a new benchmark for Granger causal discovery but also excels in few-shot learning, revealing the power of heterogeneous expert routing in complex temporal analyses.
Granger Causal Discovery (GCD) is fundamental for analyzing temporal dependencies in complex systems. However, existing neural GCD methods predominantly rely on a"one-size-fits-all"paradigm, struggling to capture distribution shifts and dynamic regime changes inherent in real-world time series. This often leads to entangled representations and spurious causal graphs. In this paper, we propose CausalMoE, a billion-scale multimodal Granger causal foundation model that explicitly models patch-level heterogeneity. CausalMoE introduces a Pattern-Routed Mixture of Heterogeneous Experts, which dynamically identifies latent temporal patterns and routes patches to specialized domain experts, effectively decoupling regime-specific mechanisms from shared dynamics. To ensure interpretable graph recovery, we design a Causality-Aware Self-Attention mechanism operating across variables, yielding sparse Granger causal graphs via proximal optimization. Furthermore, CausalMoE is the first to integrate LLMs and VLMs to align numerical signals with textual and visual priors, regularizing causal estimation in complex scenarios. Extensive experiments demonstrate that CausalMoE establishes a new state-of-the-art on fully supervised benchmarks, while effectively generalizing to few-shot settings where traditional methods fail.