Feb 26, 2026arXiv:2602.22962

Scaling Laws of Global Weather Models

Yuejiang Yu, Yuejiang Yu, Langwen Huang, Langwen Huang, A. Calotoiu, Alexandru Calotoiu, Torsten Hoefler

AI Summary

This paper investigates scaling laws for data-driven global weather models, analyzing the impact of model size (N), dataset size (D), and compute budget (C) on validation loss. The study reveals that the Aurora model exhibits the strongest data-scaling behavior, while GraphCast demonstrates high parameter efficiency but suffers from hardware underutilization. The analysis indicates that, for a fixed compute budget, prioritizing longer training durations over larger model sizes yields better performance, and wider architectures are favored over deeper ones, differing from trends in language models.

Key Contribution

Weather models defy language model scaling trends: wider architectures and larger datasets yield bigger gains than deeper networks.

Abstract

Data-driven models are revolutionizing weather forecasting. To optimize training efficiency and model performance, this paper analyzes empirical scaling laws within this domain. We investigate the relationship between model performance (validation loss) and three key factors: model size ($N$), dataset size ($D$), and compute budget ($C$). Across a range of models, we find that Aurora exhibits the strongest data-scaling behavior: increasing the training dataset by 10x reduces validation loss by up to 3.2x. GraphCast demonstrates the highest parameter efficiency, yet suffers from limited hardware utilization. Our compute-optimal analysis indicates that, under fixed compute budgets, allocating resources to longer training durations yields greater performance gains than increasing model size. Furthermore, we analyze model shape and uncover scaling behaviors that differ fundamentally from those observed in language models: weather forecasting models consistently favor increased width over depth. These findings suggest that future weather models should prioritize wider architectures and larger effective training datasets to maximize predictive performance.

Data Curation & Synthetic Data Scaling Laws & Emergent Abilities Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References34

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Scaling Laws of Global Weather Models

Related Papers