KITApr 16, 2026arXiv:2604.14739

Assessing the Performance-Efficiency Trade-off of Foundation Models in Probabilistic Electricity Price Forecasting

Jan Niklas Lettner, Jan Niklas Lettner, Hadeer El Ashhab, Hadeer El Ashhab, Veit Hagenmeyer, V. Hagenmeyer, Benjamin Schäfer, Benjamin Schafer

AI Summary

This paper benchmarks Time Series Foundation Models (TSFMs) against task-specific deep learning models for day-ahead probabilistic electricity price forecasting (PEPF) in European bidding zones. They evaluate NHITS+QRA and a Normalizing-Flow forecaster against Moirai and ChronosX using metrics like CRPS and Energy Score. Results indicate that while TSFMs generally outperform task-specific models, NHITS+QRA can achieve comparable or even superior performance with feature engineering or few-shot learning, highlighting the importance of considering computational costs.

Key Contribution

Foundation models don't always win: task-specific models can rival or even beat them in electricity price forecasting, especially with clever feature engineering or transfer learning.

Abstract

Large-scale renewable energy deployment introduces pronounced volatility into the electricity system, turning grid operation into a complex stochastic optimization problem. Accurate electricity price forecasting (EPF) is essential not only to support operational decisions, such as optimal bidding strategies and balancing power preparation, but also to reduce economic risk and improve market efficiency. Probabilistic forecasts are particularly valuable because they quantify uncertainty stemming from renewable intermittency, market coupling, and regulatory changes, enabling market participants to make informed decisions that minimize losses and optimize expected revenues. However, it remains an open question which models to employ to produce accurate forecasts. Should these be task-specific machine learning (ML) models or Time Series Foundation Models (TSFMs)? In this work, we compare four models for day-ahead probabilistic EPF (PEPF) in European bidding zones: a deterministic NHITS backbone with Quantile-Regression Averaging (NHITS+QRA) and a conditional Normalizing-Flow forecaster (NF) are compared with two TSFMs, namely Moirai and ChronosX. On the one hand, we find that TSFMs outperform task-specific deep learning models trained from scratch in terms of CRPS, Energy Score, and predictive interval calibration across market conditions. On the other hand, we find that well-configured task-specific models, particularly NHITS combined with QRA, achieve performance very close to TSFMs, and in some scenarios, such as when supplied with additional informative feature groups or adapted via few-shot learning from other European markets, they can even surpass TSFMs. Overall, our findings show that while TSFMs offer expressive modeling capabilities, conventional models remain highly competitive, emphasizing the need to weigh computational expense against marginal performance improvements in PEPF.

Eval Frameworks & Benchmarks Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References39

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Assessing the Performance-Efficiency Trade-off of Foundation Models in Probabilistic Electricity Price Forecasting

Related Papers