Search papers, labs, and topics across Lattice.
This paper benchmarks Time Series Foundation Models (TSFMs) against task-specific deep learning models for day-ahead probabilistic electricity price forecasting (PEPF) in European bidding zones. They evaluate NHITS+QRA and a Normalizing-Flow forecaster against Moirai and ChronosX using metrics like CRPS and Energy Score. Results indicate that while TSFMs generally outperform task-specific models, NHITS+QRA can achieve comparable or even superior performance with feature engineering or few-shot learning, highlighting the importance of considering computational costs.
Foundation models don't always win: task-specific models can rival or even beat them in electricity price forecasting, especially with clever feature engineering or transfer learning.
Large-scale renewable energy deployment introduces pronounced volatility into the electricity system, turning grid operation into a complex stochastic optimization problem. Accurate electricity price forecasting (EPF) is essential not only to support operational decisions, such as optimal bidding strategies and balancing power preparation, but also to reduce economic risk and improve market efficiency. Probabilistic forecasts are particularly valuable because they quantify uncertainty stemming from renewable intermittency, market coupling, and regulatory changes, enabling market participants to make informed decisions that minimize losses and optimize expected revenues. However, it remains an open question which models to employ to produce accurate forecasts. Should these be task-specific machine learning (ML) models or Time Series Foundation Models (TSFMs)? In this work, we compare four models for day-ahead probabilistic EPF (PEPF) in European bidding zones: a deterministic NHITS backbone with Quantile-Regression Averaging (NHITS+QRA) and a conditional Normalizing-Flow forecaster (NF) are compared with two TSFMs, namely Moirai and ChronosX. On the one hand, we find that TSFMs outperform task-specific deep learning models trained from scratch in terms of CRPS, Energy Score, and predictive interval calibration across market conditions. On the other hand, we find that well-configured task-specific models, particularly NHITS combined with QRA, achieve performance very close to TSFMs, and in some scenarios, such as when supplied with additional informative feature groups or adapted via few-shot learning from other European markets, they can even surpass TSFMs. Overall, our findings show that while TSFMs offer expressive modeling capabilities, conventional models remain highly competitive, emphasizing the need to weigh computational expense against marginal performance improvements in PEPF.