Search papers, labs, and topics across Lattice.
This paper investigates the challenge of adapting visual model-based reinforcement learning (MBRL) agents to distribution shifts, finding that existing adaptation methods often fail to improve or even degrade performance. To address this, they propose JEPA-Indexed Local Expert Growth, which uses a frozen JEPA representation for problem indexing and cluster-specific residual experts for local action corrections. Experiments demonstrate that this approach achieves statistically significant out-of-distribution improvements while preserving in-distribution performance, suggesting adaptation as incremental knowledge growth.
Simply detecting distribution shifts in visual MBRL is easy; the real challenge is applying the right action-level corrections, which this paper tackles with a novel local expert growth strategy.
Visual model-based reinforcement learning (MBRL) agents can perform well on the training distribution, but often break down once the test environment shifts. In visual MBRL, recognizing that a shift has occurred is often the easier part; the harder part is turning that recognition into useful action-level correction. We study several ways of responding to shift, including planning penalties, direct fine-tuning, global residual correction, and coarse gating. In our experiments, these approaches either do not improve closed-loop control or hurt in-distribution (ID) performance. Based on these negative results, we propose JEPA-Indexed Local Expert Growth. The method uses a frozen JEPA representation only for problem indexing, while cluster-specific residual experts add local action corrections on top of the original controller. The baseline controller itself is not modified. Using paired-bootstrap evaluation, we find that the original naive-preference variant is not stable under stricter testing. In contrast, the harder-pair variant produces statistically significant OOD improvements on all four evaluated shift conditions while preserving ID performance. The learned experts also remain useful when the same shift is encountered again, which supports the view of adaptation as incremental knowledge growth rather than repeated full retraining. We further show that automatic ID rejection can be achieved with simple density models, whereas fine-grained discrimination among OOD sub-families is limited by the representation. Overall, the results indicate that, for visual MBRL under distribution shift, the main challenge is not simply noticing that the environment has changed, but applying the right local action correction after the change has been recognized.