NUSEastern Institute of TechnologyIQuest ResearchPolyUUSTCJun 14, 2026arXiv:2606.15846

FlashNav: Ultra-Fast Policy Training for Robot Navigation within 20 Seconds

Shanze Wang, Yiwei Qian, Xinming Zhang, Jun Xue, Siwei Cheng, Xianghui Wang, Qingyuan Hu, Xiaoyu Shen, Wei Zhang

AI Summary

This paper introduces FlashNav, a novel GPU-first framework that enables ultra-fast training of deep reinforcement learning (DRL) policies for robot navigation, achieving policy training in less than 20 seconds. By optimizing the training loop to focus on essential navigation components while eliminating unnecessary rendering, FlashNav maintains high performance and a 100% success rate in simulated environments. The framework's effectiveness is further validated through successful transfers of learned policies to physical robots in various indoor settings, showcasing its practical applicability in real-world scenarios.

Key Contribution

Achieving robot navigation policy training in under 20 seconds could revolutionize the deployment of DRL in robotics.

Abstract

Deep reinforcement learning has shown strong potential for robot navigation, but its practical deployment is still limited by the long wall-clock cost of policy training. This paper presents FlashNav, a GPU-first framework for ultra-fast range-based robot navigation training. To the best of our knowledge, FlashNav is the first DRL-based robot navigation framework that reaches seconds-level policy training, with the fastest deployable policy trained in less than 20 seconds. The key idea is to align simulation with the navigation MDP: FlashNav preserves the essential components for velocity-level navigation, including occupancy geometry, range sensing, goal-conditioned control, robot motion dynamics, collision handling, termination, and reset, while removing unnecessary rendering and high-fidelity physical details from the training loop. Built on a batched bitmap simulator and a fully GPU-resident training pipeline with our FastDSAC learner, FlashNav generates massive parallel navigation transitions entirely on GPU. Experiments on TurtleBot2 and Unitree Go2 show that FlashNav achieves a 100\% success-rate below 20 seconds on an RTX 5090 and remains within tens of seconds across desktop GPUs. The learned policies further transfer to physical wheeled and legged robots in static and dynamic indoor scenes, demonstrating that DRL-based navigation can be trained at seconds-level speed while preserving deployable obstacle-avoidance behavior.

Robotics & Embodied AI Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

FlashNav: Ultra-Fast Policy Training for Robot Navigation within 20 Seconds

Related Papers