Search papers, labs, and topics across Lattice.
This paper introduces an end-to-end reinforcement learning controller for point-to-point navigation of spherical robots, addressing the mismatch issues of traditional hierarchical planning and tracking methods. The controller uses proprioceptive information as input and directly outputs motor commands, incorporating a long history encoder, tailored reward functions, and curriculum learning to handle the unique dynamics of spherical robots. Experimental results demonstrate high efficiency, stability, and adaptability in both simulation (88.87% success rate) and real-world scenarios, achieved through a MC-CMA-ES-based system identification method for sim-to-real transfer.
Forget hierarchical planners – RL can directly control spherical robots for efficient point-to-point navigation, even transferring learned policies from simulation to the real world.
Point-to-point navigation is an important ability for spherical robots. Traditional methods usually use a planner and a tracker for short-range target control. However, this hierarchical method suffers from a mismatch issue. In this work, we propose an end-to-end controller based on reinforcement learning. The proposed approach is designed for waypoint tracking after a path has been planned. Taking proprioceptive information such as the robot’s position and orientation as input, our controller directly outputs motor commands to control the spherical robot. To adapt to the unique characteristics of a spherical robot, we have designed various reward functions, a long history encoder, and curriculum learning. We demonstrate that our policy can execute point-to-point tasks with high efficiency stability and adaptability to uncertain environments, achieving a success rate of 88.87% in simulation. To transfer the policy trained in simulation to the real world, we developed a MC-CMA-ES method for system identification to accurately model the simulator’s parameters. This process significantly narrows the gap between simulation and reality, enabling our policy to achieve high stability and efficiency in real-world scenarios.