Search papers, labs, and topics across Lattice.
The paper introduces Agent-Guided Cross-modal Decoding (AGCD), a decoding-time prior-injection paradigm that leverages MLLMs to extract state-conditioned physics-priors from multivariate atmospheric data. AGCD employs a multi-agent meteorological narration pipeline and cross-modal region interaction decoding to refine visual features with these priors. Experiments on WeatherBench demonstrate that AGCD consistently improves 6-hour forecasting accuracy and long-horizon stability across different resolutions and backbones by reducing error accumulation in autoregressive rollouts.
Injecting physics-based priors derived from MLLMs at decoding time significantly boosts weather forecasting accuracy and stability, even in long autoregressive rollouts.
Accurate weather forecasting is more than grid-wise regression: it must preserve coherent synoptic structures and physical consistency of meteorological fields, especially under autoregressive rollouts where small one-step errors can amplify into structural bias. Existing physics-priors approaches typically impose global, once-for-all constraints via architectures, regularization, or NWP coupling, offering limited state-adaptive and sample-specific controllability at deployment. To bridge this gap, we propose Agent-Guided Cross-modal Decoding (AGCD), a plug-and-play decoding-time prior-injection paradigm that derives state-conditioned physics-priors from the current multivariate atmosphere and injects them into forecasters in a controllable and reusable way. Specifically, We design a multi-agent meteorological narration pipeline to generate state-conditioned physics-priors, utilizing MLLMs to extract various meteorological elements effectively. To effectively apply the priors, AGCD further introduce cross-modal region interaction decoding that performs region-aware multi-scale tokenization and efficient physics-priors injection to refine visual features without changing the backbone interface. Experiments on WeatherBench demonstrate consistent gains for 6-hour forecasting across two resolutions (5.625 degree and 1.40625 degree) and diverse backbones (generic and weather-specialized), including strictly causal 48-hour autoregressive rollouts that reduce early-stage error accumulation and improve long-horizon stability.