MilaCambridgeJan 6, 2026arXiv:2601.03015

In-Context Reinforcement Learning through Bayesian Fusion of Context and Value Prior

Anaïs Berkes, Vincent Taboga, Donna Vakalis, David Rolnick, Y. Bengio

AI Summary

The paper introduces SPICE, a Bayesian in-context reinforcement learning (ICRL) method that learns a prior over Q-values using a deep ensemble and updates it at test-time via Bayesian updates using in-context information. This approach addresses limitations of existing ICRL methods by enabling improvement beyond the training distribution and robustness to suboptimal training data. SPICE achieves regret-optimal behavior in stochastic bandits and finite-horizon MDPs, even when pre-trained on suboptimal trajectories, and demonstrates superior empirical performance compared to existing ICRL and meta-RL methods on bandit and control benchmarks.

Key Contribution

Even when trained on suboptimal data, a Bayesian in-context RL agent can achieve near-optimal decisions on unseen tasks by fusing a learned Q-value prior with in-context information and employing an upper-confidence bound for exploration.

Abstract

In-context reinforcement learning (ICRL) promises fast adaptation to unseen environments without parameter updates, but current methods either cannot improve beyond the training distribution or require near-optimal data, limiting practical adoption. We introduce SPICE, a Bayesian ICRL method that learns a prior over Q-values via deep ensemble and updates this prior at test-time using in-context information through Bayesian updates. To recover from poor priors resulting from training on sub-optimal data, our online inference follows an Upper-Confidence Bound rule that favours exploration and adaptation. We prove that SPICE achieves regret-optimal behaviour in both stochastic bandits and finite-horizon MDPs, even when pretrained only on suboptimal trajectories. We validate these findings empirically across bandit and control benchmarks. SPICE achieves near-optimal decisions on unseen tasks, substantially reduces regret compared to prior ICRL and meta-RL approaches while rapidly adapting to unseen tasks and remaining robust under distribution shift.

Robotics & Embodied AI World Models & Planning

Citation Metrics

Citations0

Influential citations0

References49

Year2026

VenuearXiv.org

Related Papers

Finding related papers...

Search

In-Context Reinforcement Learning through Bayesian Fusion of Context and Value Prior

Related Papers