Search papers, labs, and topics across Lattice.
1
0
2
3
Even when trained on suboptimal data, a Bayesian in-context RL agent can achieve near-optimal decisions on unseen tasks by fusing a learned Q-value prior with in-context information and employing an upper-confidence bound for exploration.