Search papers, labs, and topics across Lattice.
CRAN-PM, a dual-branch Vision Transformer, is introduced to address the scalability limitations of Transformers for high-resolution spatio-temporal prediction of PM2.5 levels. It fuses global meteorological data with local PM2.5 data using cross-resolution attention, and incorporates elevation-aware self-attention and wind-guided cross-attention to learn physically consistent feature representations. Results on daily PM2.5 forecasting across Europe demonstrate a reduction in RMSE by 4.7% at T+1 and 10.7% at T+3 compared to single-scale baselines, along with a 36% bias reduction in complex terrain, while maintaining memory efficiency and fast inference.
You can now forecast continent-scale, high-resolution air quality maps in under 2 seconds on a single GPU, thanks to a novel cross-resolution attention mechanism that outperforms traditional methods.
Vision Transformers have achieved remarkable success in spatio-temporal prediction, but their scalability remains limited for ultra-high-resolution, continent-scale domains required in real-world environmental monitoring. A single European air-quality map at 1 km resolution comprises 29 million pixels, far beyond the limits of naive self-attention. We introduce CRAN-PM, a dual-branch Vision Transformer that leverages cross-resolution attention to efficiently fuse global meteorological data (25 km) with local high-resolution PM2.5 at the current time (1 km). Instead of including physically driven factors like temperature and topography as input, we further introduce elevation-aware self-attention and wind-guided cross-attention to force the network to learn physically consistent feature representations for PM2.5 forecasting. CRAN-PM is fully trainable and memory-efficient, generating the complete 29-million-pixel European map in 1.8 seconds on a single GPU. Evaluated on daily PM2.5 forecasting throughout Europe in 2022 (362 days, 2,971 European Environment Agency (EEA) stations), it reduces RMSE by 4.7% at T+1 and 10.7% at T+3 compared to the best single-scale baseline, while reducing bias in complex terrain by 36%.