Mar 15, 2026arXiv:2603.14554

MorFiC: Fixing Value Miscalibration for Zero-Shot Quadruped Transfer

Prakhar Mishra, Amir Hossain Raj, Xuesu Xiao, Dinesh Manocha

AI Summary

MorFiC is introduced to address the challenge of zero-shot cross-morphology transfer in quadrupedal robot locomotion by resolving value miscalibration in multi-morphology actor-critic training. The method conditions the critic network on robot morphology parameters, enabling morphology-specific value estimates within a shared network. Experiments demonstrate that MorFiC, trained with morphology randomization, outperforms morphology-conditioned PPO baselines and achieves significant speed gains on various target robots, including zero-shot deployment on Unitree Go1 and Go2 robots.

Key Contribution

Unlock zero-shot quadruped locomotion across diverse robot morphologies by fixing value miscalibration with a morphology-conditioned critic, achieving up to 5x speed gains on unseen robots.

Abstract

Generalizing learned locomotion policies across quadrupedal robots with different morphologies remain a challenge. Policies trained on a single robot often break when deployed on embodiments with different mass distributions, kinematics, joint limits, or actuation constraints, forcing per robot retraining. We present MorFiC, a reinforcement learning approach for zero-shot cross-morphology locomotion using a single shared policy. MorFiC resolves a key failure mode in multi-morphology actor-critic training: a shared critic tends to average incompatible value targets across embodiments, yielding miscalibrated advantages. To address this, MorFiC conditions the critic via morphology-aware modulation driven by robot physical and control parameters, generating morphology-specific value estimates within a shared network. Trained with a single source robot with morphology randomization in simulation, MorFiC can transfer to unseen robots and surpasses morphology-conditioned PPO baselines by improving stable average speed and longest stable run on multiple targets, including speed gains of +16.1% on A1, ~2x on Cheetah, and ~5x on B1. We additionally show that MorFiC reduces the value-prediction error variance across morphologies and stabilizes the advantage estimates, demonstrating that the improved value-function calibration corresponds to a stronger transfer performance. Finally, we demonstrate zero-shot deployment on two Unitree Go1 and Go2 robots without fine-tuning, indicating that critic-side conditioning is a practical approach for cross-morphology generalization.

Robotics & Embodied AI Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

MorFiC: Fixing Value Miscalibration for Zero-Shot Quadruped Transfer

Related Papers