Search papers, labs, and topics across Lattice.
The paper introduces ITS-Mina, an all-MLP architecture for multivariate time series forecasting that uses iterative refinement with a shared-parameter residual mixer stack to deepen temporal representation learning. It also incorporates an external attention mechanism with learnable memory units for linear-complexity cross-sample dependency capture and employs Harris Hawks Optimization (HHO) for adaptive dropout rate tuning. Experiments on six benchmark datasets show ITS-Mina achieves state-of-the-art or competitive performance against Transformer-based and other MLP baselines.
Ditch the Transformers: a cleverly designed all-MLP architecture, ITS-Mina, rivals state-of-the-art time series forecasting while slashing computational costs.
Multivariate time series forecasting plays a pivotal role in numerous real-world applications, including financial analysis, energy management, and traffic planning. While Transformer-based architectures have gained popularity for this task, recent studies reveal that simpler MLP-based models can achieve competitive or superior performance with significantly reduced computational cost. In this paper, we propose ITS-Mina, a novel all-MLP framework for multivariate time series forecasting that integrates three key innovations: (1) an iterative refinement mechanism that progressively enhances temporal representations by repeatedly applying a shared-parameter residual mixer stack, effectively deepening the model's computational capacity without multiplying the number of distinct parameters; (2) an external attention module that replaces traditional self-attention with learnable memory units, capturing cross-sample global dependencies at linear computational complexity; and (3) a Harris Hawks Optimization (HHO) algorithm for automatic dropout rate tuning, enabling adaptive regularization tailored to each dataset. Extensive experiments on six widely-used benchmark datasets demonstrate that ITS-Mina achieves state-of-the-art or highly competitive performance compared to eleven baseline models across multiple forecasting horizons.