Mar 2, 2026arXiv:2603.01813

SSMG-Nav: Enhancing Lifelong Object Navigation with Semantic Skeleton Memory Graph

Haochen Niu, Haochen Niu, Lantao Zhang, Lantao Zhang, Xingwu Ji, Xingwu Ji, R. Ying, Rendong Ying, Peilin Liu, Fei Wen, Fei Wen

AI Summary

The paper introduces SSMG-Nav, a novel object navigation framework that leverages a Semantic Skeleton Memory Graph (SSMG) to consolidate past observations into a spatially aligned, persistent memory. This SSMG clusters entities into subgraphs, unifying entity- and space-level semantics, and uses a vision-language model (VLM) guided by multimodal prompts to infer target beliefs over destinations. Results on lifelong and standard ObjectNav benchmarks demonstrate that SSMG-Nav achieves higher success rates and greater path efficiency compared to strong baselines, effectively reducing backtracking.

Key Contribution

Robots can now navigate more efficiently in unfamiliar environments thanks to a memory graph that fuses visual, textual, and spatial information to reduce backtracking.

Abstract

Navigating to out-of-sight targets from human instructions in unfamiliar environments is a core capability for service robots. Despite substantial progress, most approaches underutilize reusable, persistent memory, constraining performance in lifelong settings. Many are additionally limited to single-modality inputs and employ myopic greedy policies, which often induce inefficient back-and-forth maneuvers (BFMs). To address such limitations, we introduce SSMG-Nav, a framework for object navigation built on a \textit{Semantic Skeleton Memory Graph} (SSMG) that consolidates past observations into a spatially aligned, persistent memory anchored by topological keypoints (e.g., junctions, room centers). SSMG clusters nearby entities into subgraphs, unifying entity- and space-level semantics to yield a compact set of candidate destinations. To support multimodal targets (images, objects, and text), we integrate a vision-language model (VLM). For each subgraph, a multimodal prompt synthesized from memory guides the VLM to infer a target belief over destinations. A long-horizon planner then trades off this belief against traversability costs to produce a visit sequence that minimizes expected path length, thereby reducing backtracking. Extensive experiments on challenging lifelong benchmarks and standard ObjectNav benchmarks demonstrate that, compared to strong baselines, our method achieves higher success rates and greater path efficiency, validating the effectiveness of SSMG-Nav.

Multimodal Models Robotics & Embodied AI World Models & Planning

Citation Metrics

Citations0

Influential citations0

References39

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SSMG-Nav: Enhancing Lifelong Object Navigation with Semantic Skeleton Memory Graph

Related Papers