Mar 5, 2026arXiv:2603.04791

Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling

Yong Liu, Xin Su, Xingjian Su, Shiyu Wang, Haoran Zhang, Haixuan Liu, Yuxuan Wang, Zhou Ye, Yang Xiang, Jianmin Wang, Mingsheng Long

AI Summary

Timer-S1, a Mixture-of-Experts time series foundation model with 8.3B parameters, achieves state-of-the-art forecasting performance by employing Serial Scaling across model architecture, dataset, and training pipeline. The model integrates sparse TimeMoE and TimeSTP blocks for Serial-Token Prediction (STP), a training objective designed to improve long-term predictions. A post-training stage, including continued pre-training and long-context extension, further enhances Timer-S1's performance on both short-term and long-context forecasting tasks.

Key Contribution

Time series forecasting gets a serious upgrade: Timer-S1, a new foundation model, blows away benchmarks by scaling model size, dataset size, and training in a coordinated "serial" fashion.

Abstract

We introduce Timer-S1, a strong Mixture-of-Experts (MoE) time series foundation model with 8.3B total parameters, 0.75B activated parameters for each token, and a context length of 11.5K. To overcome the scalability bottleneck in existing pre-trained time series foundation models, we perform Serial Scaling in three dimensions: model architecture, dataset, and training pipeline. Timer-S1 integrates sparse TimeMoE blocks and generic TimeSTP blocks for Serial-Token Prediction (STP), a generic training objective that adheres to the serial nature of forecasting. The proposed paradigm introduces serial computations to improve long-term predictions while avoiding costly rolling-style inference and pronounced error accumulation in the standard next-token prediction. Pursuing a high-quality and unbiased training dataset, we curate TimeBench, a corpus with one trillion time points, and apply meticulous data augmentation to mitigate predictive bias. We further pioneer a post-training stage, including continued pre-training and long-context extension, to enhance short-term and long-context performance. Evaluated on the large-scale GIFT-Eval leaderboard, Timer-S1 achieves state-of-the-art forecasting performance, attaining the best MASE and CRPS scores as a pre-trained model. Timer-S1 will be released to facilitate further research.

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References61

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling

Related Papers