Search papers, labs, and topics across Lattice.
The authors introduce POINav-Bench, a new benchmark for real-world vision-language navigation to a Point of Interest (POI), using 3D Gaussian Splatting to reconstruct 11 commercial areas. They also propose a POINav Brain-Action Framework, where a "Brain" module performs POI-grounded reasoning to guide an "Action" module in predicting continuous waypoints. Experiments on the benchmark, using a curated dataset of 70K real-world signage-entrance pairs, demonstrate the framework's effectiveness in refining real-world POI-goal navigation.
Closing the sim-to-real gap in vision-language navigation requires benchmarks grounded in realistic 3D reconstructions, not just generated scenes.
Real-world navigation is fundamentally driven by Points of Interest (POIs), yet reaching a precise POI remains a critical "final-meters" challenge. Existing Vision-Language Navigation (VLN) benchmarks of POI-goal navigation often suffer from coarse granularity or significant sim-to-real gaps due to generated scene. To bridge this gap, we present POINav-Bench, the first benchmark designed for closed-loop evaluation of real-world POI-goal navigation. It comprises 11 commercial areas reconstructed from real-world captures using 3D Gaussian Splatting (3DGS), covering 126,398 $m^{2}$ in total and spanning 163 distinct POIs. With traversability-aware annotations and reference trajectories, POINav-Bench enables high-fidelity evaluation of navigation agents in realistic, POI-rich real-world environments. Building on this, we propose the POINav Brain-Action Framework where a Brain module performs POI-grounded reasoning to guide an Action module in predicting continuous waypoints for real-world execution. We further curate the POINav-Dataset, containing 70K real-world signage-entrance pairs. Experiments show that our framework provides a viable path toward refining real-world POI-goal navigation.