CambridgeInstitute of Geodesy and PhotogrammetryScalable Parallel Computing LabSeminar for Applied MathematicsSwiss Data Science CenterUMassMay 28, 2026arXiv:2605.30184

Can AI Weather Models Predict Beyond Two Weeks? A Quantitative Benchmark and Analysis of Long Rollouts

Fanny Lehmann, Fanny Lehmann, Firat Ozdemir, Firat Ozdemir, Yun Cheng, Yun Cheng, Torsten Hoefler, Torsten Hoefler, S. Schemm, Sebastian Schemm, Benedikt Soja, Benedikt Soja, Siddhartha Mishra, Siddhartha Mishra

AI Summary

This paper introduces a taxonomy of failure modes (blow-up, drift, loss of seasonality) for long-range AI weather forecasts, evaluating nine state-of-the-art models over year-long rollouts. The analysis reveals that model stability depends on how small spatio-temporal scales are handled, with unstable models amplifying high-frequency energy and stable models acting as denoisers. Ablation studies on Vision Transformer architectures confirm these findings, demonstrating that stable models generate unique weather trajectories conditioned on the initial state.

Key Contribution

AI weather models don't just parrot data; stable ones can generate unique, plausible year-long weather trajectories, but only if they properly handle high-frequency noise.

Abstract

While AI weather models excel at short-to-medium range forecasts (up to 15 days), they frequently suffer from ill-defined"instabilities"when rolled out over longer horizons. This work addresses the lack of a formal taxonomy by categorizing these failures into three distinct regimes: blow-up, drift, and loss of seasonality, through year-long rollouts of nine state-of-the-art AI weather models. Our analysis reveals that stability hinges on the treatment of small spatio-temporal scales: unstable models amplify high-frequency energy, while stable models act as denoisers when noise is added to their inputs. Far from reducing these models to mere stochastic parrots, our findings highlight that stable models generate unique weather trajectories, conditioned on the initial state. We verify our findings through ablation studies on architectural design choices, conducted using state-of-the-art Vision Transformer (ViT) AI weather model architectures.

Eval Frameworks & Benchmarks Scientific Discovery & Drug Design World Models & Planning

Citation Metrics

Citations0

Influential citations0

References32

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Can AI Weather Models Predict Beyond Two Weeks? A Quantitative Benchmark and Analysis of Long Rollouts

Related Papers