Search papers, labs, and topics across Lattice.
NavGSim is introduced as a Gaussian Splatting-based simulator for generating high-fidelity, large-scale navigation environments, addressing the challenge of realistic environment simulation for robot learning. It uses a hierarchical 3D Gaussian Splatting framework for photorealistic rendering in expansive scenes and a Gaussian Splatting-based slice technique for simulating navigation collisions. Experiments show that training a Vision-Language-Action (VLA) model using trajectories from NavGSim significantly improves scene understanding and navigation performance in both simulated and real-world environments.
NavGSim lets you train robots in photorealistic, large-scale simulated environments using Gaussian Splatting, bridging the sim-to-real gap for navigation tasks.
Simulating realistic environments for robots is widely recognized as a critical challenge in robot learning, particularly in terms of rendering and physical simulation. This challenge becomes even more pronounced in navigation tasks, where trajectories often extend across multiple rooms or entire floors. In this work, we present NavGSim, a Gaussian Splatting-based simulator designed to generate high-fidelity, large-scale navigation environments. Built upon a hierarchical 3D Gaussian Splatting framework, NavGSim enables photorealistic rendering in expansive scenes spanning hundreds of square meters. To simulate navigation collisions, we introduce a Gaussian Splatting-based slice technique that directly extracts navigable areas from reconstructed Gaussians. Additionally, for ease of use, we provide comprehensive NavGSim APIs supporting multi-GPU development, including tools for custom scene reconstruction, robot configuration, policy training, and evaluation. To evaluate NavGSim's effectiveness, we train a Vision-Language-Action (VLA) model using trajectories collected from NavGSim and assess its performance in both simulated and real-world environments. Our results demonstrate that NavGSim significantly enhances the VLA model's scene understanding, enabling the policy to handle diverse navigation queries effectively.