Apr 28, 2026arXiv:2604.25496

Improving Zero-Shot Offline RL via Behavioral Task Sampling

Nazim Bendib, Nicolas Perrin-Gilbert, Olivier Sigaud

AI Summary

This paper addresses the problem of suboptimal zero-shot generalization in offline reinforcement learning (RL) due to random task vector sampling. They propose extracting task vectors directly from the offline dataset to define the task distribution for policy training, rather than relying on random sampling. Experiments across multiple benchmark environments demonstrate that this approach improves zero-shot performance by an average of 20%.

Key Contribution

Randomly sampling tasks in offline RL hurts zero-shot generalization, but extracting task vectors directly from the dataset boosts performance by 20%.

Abstract

Offline zero-shot reinforcement learning (RL) aims to learn agents that optimize unseen reward functions without additional environment interaction. The standard approach to this problem trains task-conditioned policies by sampling task vectors that define linear reward functions over learned state representations. In most existing algorithms, these task vectors are randomly sampled, implicitly assuming this adequately captures the structure of the task space. We argue that doing so leads to suboptimal zero-shot generalization. To address this limitation, we propose extracting task vectors directly from the offline dataset and using them to define the task distribution used for policy training. We introduce a simple and general reward function extraction procedure that integrates into existing offline zero-shot RL algorithms. Across multiple benchmark environments and baselines, our approach improves zero-shot performance by an average of 20%, highlighting the importance of principled task sampling in offline zero-shot RL.

Robotics & Embodied AI World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Improving Zero-Shot Offline RL via Behavioral Task Sampling

Related Papers