BAIRMar 5, 2026arXiv:2603.05066

Reward-Conditioned Reinforcement Learning

Michal Nauman, Marek Cygan, Pieter Abbeel

AI Summary

Reward-Conditioned Reinforcement Learning (RCRL) trains a single RL agent to optimize a family of reward specifications, even while collecting experience under only one nominal objective. This is achieved by conditioning the agent on reward parameterizations and learning multiple reward objectives from a shared replay buffer entirely off-policy. Experiments across diverse benchmarks show that RCRL improves performance under the nominal reward and enables efficient adaptation to new reward parameterizations, leading to robust and steerable policies.

Key Contribution

Train one RL agent to handle a whole family of reward functions, unlocking robust and adaptable policies without the complexity of multi-task training.

Abstract

RL agents are typically trained under a single, fixed reward function, which makes them brittle to reward misspecification and limits their ability to adapt to changing task preferences. We introduce Reward-Conditioned Reinforcement Learning (RCRL), a framework that trains a single agent to optimize a family of reward specifications while collecting experience under only one nominal objective. RCRL conditions the agent on reward parameterizations and learns multiple reward objectives from a shared replay data entirely off-policy, enabling a single policy to represent reward-specific behaviors. Across single-task, multi-task, and vision-based benchmarks, we show that RCRL not only improves performance under the nominal reward parameterization, but also enables efficient adaptation to new parameterizations. Our results demonstrate that RCRL provides a scalable mechanism for learning robust, steerable policies without sacrificing the simplicity of single-task training.

RLHF & Preference Learning Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References76

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Reward-Conditioned Reinforcement Learning

Related Papers