Mar 12, 2026arXiv:2603.12087

Cross-Domain Policy Optimization via Bellman Consistency and Hybrid Critics

Minghui Chen, Ming-Hong Chen, Kuan-Chen Pan, You-De Huang, Xi Liu, Ping-Chun Hsieh

AI Summary

This paper introduces $Q$Avatar, a novel cross-domain reinforcement learning (CDRL) approach that addresses challenges in transferring knowledge between dissimilar source and target domains. It leverages the concept of cross-domain Bellman consistency to measure the transferability of source-domain models. $Q$Avatar combines Q-functions from both domains using an adaptive, hyperparameter-free weight function, achieving reliable knowledge transfer and demonstrating improved performance on locomotion and robot arm manipulation tasks.

Key Contribution

Forget hand-tuning transfer learning hyperparameters: $Q$Avatar adaptively combines source and target domain Q-functions for reliable cross-domain RL without them.

Abstract

Cross-domain reinforcement learning (CDRL) is meant to improve the data efficiency of RL by leveraging the data samples collected from a source domain to facilitate the learning in a similar target domain. Despite its potential, cross-domain transfer in RL is known to have two fundamental and intertwined challenges: (i) The source and target domains can have distinct state space or action space, and this makes direct transfer infeasible and thereby requires more sophisticated inter-domain mappings; (ii) The transferability of a source-domain model in RL is not easily identifiable a priori, and hence CDRL can be prone to negative effect during transfer. In this paper, we propose to jointly tackle these two challenges through the lens of \textit{cross-domain Bellman consistency} and \textit{hybrid critic}. Specifically, we first introduce the notion of cross-domain Bellman consistency as a way to measure transferability of a source-domain model. Then, we propose $Q$Avatar, which combines the Q functions from both the source and target domains with an adaptive hyperparameter-free weight function. Through this design, we characterize the convergence behavior of $Q$Avatar and show that $Q$Avatar achieves reliable transfer in the sense that it effectively leverages a source-domain Q function for knowledge transfer to the target domain. Through experiments, we demonstrate that $Q$Avatar achieves favorable transferability across various RL benchmark tasks, including locomotion and robot arm manipulation. Our code is available at https://rl-bandits-lab.github.io/Cross-Domain-RL/.

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Citation Metrics

Citations0

Influential citations0

References54

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Cross-Domain Policy Optimization via Bellman Consistency and Hybrid Critics

Related Papers