Search papers, labs, and topics across Lattice.
The paper introduces BAT, a hierarchical reinforcement learning framework for humanoid robots that dynamically switches between agile and stable control policies to achieve long-horizon loco-manipulation. A switching policy, trained with expert guidance from sliding-horizon pre-evaluation, selects between two complementary whole-body RL controllers. Results on a Unitree G1 robot show BAT outperforms existing methods in diverse tasks by effectively balancing agility and stability.
Humanoid robots can now nimbly switch between agile and stable control on the fly, thanks to a new hierarchical RL approach that learns when to prioritize speed versus balance.
Despite recent advances in control, reinforcement learning, and imitation learning, developing a unified framework that can achieve agile, precise, and robust whole-body behaviors, particularly in long-horizon tasks, remains challenging. Existing approaches typically follow two paradigms: coupled whole-body policies for global coordination and decoupled policies for modular precision. However, without a systematic method to integrate both, this trade-off between agility, robustness, and precision remains unresolved. In this work, we propose BAT, an online policy-switching framework that dynamically selects between two complementary whole-body RL controllers to balance agility and stability across different motion contexts. Our framework consists of two complementary modules: a switching policy learned via hierarchical RL with an expert guidance from sliding-horizon policy pre-evaluation, and an option-aware VQ-VAE that predicts option preference from discrete motion token sequences for improved generalization. The final decision is obtained via confidence-weighted fusion of two modules. Extensive simulations and real-world experiments on the Unitree G1 humanoid robot demonstrate that BAT enables versatile long-horizon loco-manipulation and outperforms prior methods across diverse tasks.