Search papers, labs, and topics across Lattice.
This paper introduces R2F, an LLM-free framework for zero-shot open-vocabulary object navigation that repurposes ray frontiers to represent direction-conditioned semantic hypotheses as navigation goals. Language-aligned features are sparsely stored at frontiers, enabling navigation via embedding-based frontier scoring and goal tracking within a classical mapping and planning pipeline. Experiments in Habitat-sim and on a real robot show R2F achieves competitive zero-shot performance with real-time execution, running up to 6x faster than VLM-based methods.
Ditch the LLM and still navigate: R2F achieves real-time zero-shot object navigation by cleverly repurposing ray frontiers for efficient, embedding-based goal selection.
Zero-shot open-vocabulary object navigation has progressed rapidly with the emergence of large Vision-Language Models (VLMs) and Large Language Models (LLMs), now widely used as high-level decision-makers instead of end-to-end policies. Although effective, such systems often rely on iterative large-model queries at inference time, introducing latency and computational overhead that limit real-time deployment. To address this problem, we repurpose ray frontiers (R2F), a recently proposed frontier-based exploration paradigm, to develop an LLM-free framework for indoor open-vocabulary object navigation. While ray frontiers were originally used to bias exploration using semantic cues carried along rays, we reinterpret frontier regions as explicit, direction-conditioned semantic hypotheses that serve as navigation goals. Language-aligned features accumulated along out-of-range rays are stored sparsely at frontiers, where each region maintains multiple directional embeddings encoding plausible unseen content. In this way, navigation then reduces to embedding-based frontier scoring and goal tracking within a classical mapping and planning pipeline, eliminating iterative large-model reasoning. We further introduce R2F-VLN, a lightweight extension for free-form language instructions using syntactic parsing and relational verification without additional VLM or LLM components. Experiments in Habitat-sim and on a real robotic platform demonstrate competitive state-of-the-art zero-shot performance with real-time execution, achieving up to 6 times faster runtime than VLM-based alternatives.