Search papers, labs, and topics across Lattice.
The paper introduces Stochastic Attention, a novel inference-time technique that randomizes attention weights using multinomial sampling to generate predictive ensembles from scientific foundation models without retraining. They optimize a concentration parameter via a calibration objective to align the stochastic attention output with the target, enabling efficient post-hoc uncertainty calibration. Experiments on weather forecasting, time series forecasting, and regression tasks demonstrate that Stochastic Attention achieves superior calibration and sharper prediction intervals compared to uncertainty-aware baselines, while requiring significantly less tuning time.
Get calibrated uncertainty estimates from your scientific foundation models in minutes, not days, with this simple attention randomization trick.
Transformer-based scientific foundation models are increasingly deployed in high-stakes settings, but current architectures give deterministic outputs and provide limited support for calibrated predictive uncertainty. We propose Stochastic Attention, a lightweight inference-time modification that randomizes attention by replacing softmax weights with normalized multinomial samples controlled by a single concentration parameter, and produces predictive ensembles without retraining. To set this parameter, we introduce a calibration objective that matches the stochastic attention output with the target, yielding an efficient univariate post-hoc tuning problem. We evaluate this mechanism on two scientific foundation models for weather and timeseries forecasting along with an additional regression task. Across benchmarks against uncertainty-aware baselines, we find that Stochastic Attention achieves the strongest native calibration and the sharpest prediction intervals at comparable coverage, while requiring only minutes of post-hoc tuning versus days of retraining for competitive baselines.