Search papers, labs, and topics across Lattice.
The paper introduces LAMP, a neural language field-based navigation framework that learns a continuous, language-driven map for robot navigation, addressing the scalability issues of explicit language vector storage. LAMP encodes language features as an implicit neural field, enabling efficient coarse path planning with a sparse graph followed by gradient-based optimization for fine-grained pose refinement. Experiments in simulation and real-world environments demonstrate that LAMP achieves superior memory efficiency and goal-reaching accuracy compared to explicit methods by leveraging semantic similarities and a Bayesian framework for uncertainty modeling.
Robots can now navigate complex environments using natural language instructions with significantly improved memory efficiency and accuracy, thanks to a novel implicit neural field representation that replaces explicit language maps.
Recent advances in vision-language models have made zero-shot navigation feasible, enabling robots to interpret and follow natural language instructions without requiring labeling. However, existing methods that explicitly store language vectors in grid or node-based maps struggle to scale to large environments due to excessive memory requirements and limited resolution for fine-grained planning. We introduce LAMP (Language Map), a novel neural language field-based navigation framework that learns a continuous, language-driven map and directly leverages it for fine-grained path generation. Unlike prior approaches, our method encodes language features as an implicit neural field rather than storing them explicitly at every location. By combining this implicit representation with a sparse graph, LAMP supports efficient coarse path planning and then performs gradient-based optimization in the learned field to refine poses near the goal. Our two-stage pipeline of coarse graph search followed by language-driven, gradient-guided optimization is the first application of an implicit language map for precise path generation. This refinement is particularly effective at selecting goal regions not directly observed by leveraging semantic similarities in the learned feature space. To further enhance robustness, we adopt a Bayesian framework that models embedding uncertainty via the von Mises–Fisher distribution, thereby improving generalization to unobserved regions. To scale to large environments, LAMP employs a graph sampling strategy that prioritizes spatial coverage and embedding confidence, retaining only the most informative nodes and substantially reducing computational overhead. Our experimental results, both in NVIDIA Isaac Sim and on a real multi-floor building, demonstrate that LAMP outperforms existing explicit methods in both memory efficiency and fine-grained goal-reaching accuracy, opening new possibilities for scalable, language-driven robot navigation.