SJTUFeb 16, 2026arXiv:2602.14744

Rethinking the Role of LLMs in Time Series Forecasting

Xin Qiu, Junlong Tong, Yirong Sun, Yunpu Ma, Xiaoyu Shen

AI Summary

This paper investigates the utility of large language models (LLMs) for time series forecasting (TSF) across a large-scale dataset of 8 billion observations, challenging previous studies that found limited benefits. The authors demonstrate that LLMs significantly improve forecasting performance, particularly in cross-domain generalization scenarios, and that pre-alignment strategies outperform post-alignment. They further show that both the pre-trained knowledge and the model architecture of LLMs contribute to performance, with pre-training being crucial for distribution shifts and architecture excelling at modeling complex temporal dynamics.

Key Contribution

LLMs actually *do* improve time series forecasting, especially for cross-domain generalization, overturning prior doubts with a massive 8-billion observation study.

Abstract

Large language models (LLMs) have been introduced to time series forecasting (TSF) to incorporate contextual knowledge beyond numerical signals. However, existing studies question whether LLMs provide genuine benefits, often reporting comparable performance without LLMs. We show that such conclusions stem from limited evaluation settings and do not hold at scale. We conduct a large-scale study of LLM-based TSF (LLM4TSF) across 8 billion observations, 17 forecasting scenarios, 4 horizons, multiple alignment strategies, and both in-domain and out-of-domain settings. Our results demonstrate that \emph{LLM4TS indeed improves forecasting performance}, with especially large gains in cross-domain generalization. Pre-alignment outperforming post-alignment in over 90\% of tasks. Both pretrained knowledge and model architecture of LLMs contribute and play complementary roles: pretraining is critical under distribution shifts, while architecture excels at modeling complex temporal dynamics. Moreover, under large-scale mixed distributions, a fully intact LLM becomes indispensable, as confirmed by token-level routing analysis and prompt-based improvements. Overall, Our findings overturn prior negative assessments, establish clear conditions under which LLMs are not only useful, and provide practical guidance for effective model design. We release our code at https://github.com/EIT-NLP/LLM4TSF.

Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Rethinking the Role of LLMs in Time Series Forecasting

Related Papers