Mar 4, 2026arXiv:2603.04056

Long-Term Visual Localization in Dynamic Benthic Environments: A Dataset, Footprint-Based Ground Truth, and Visual Place Recognition Benchmark

M. Larsen, Martin Kvisvik Larsen, Oscar Pizarro

AI Summary

A new dataset for long-term visual localization in benthic environments is introduced, comprising georeferenced AUV imagery from five sites revisited over six years, along with raw/corrected stereo imagery, calibrations, and registered poses. A novel ground-truthing method based on 3D seafloor image footprint overlap is proposed to address georeferencing inaccuracies and image footprints. Benchmarking of eight VPR methods on the dataset reveals significantly lower Recall@K compared to existing benchmarks, highlighting the challenges of this domain, and demonstrating that distance-threshold ground-truthing overestimates VPR performance in rugged terrain.

Key Contribution

Forget what you know about visual place recognition benchmarks: this new underwater dataset reveals that existing methods struggle with long-term localization in dynamic benthic environments, and traditional ground-truthing overestimates performance.

Abstract

Long-term visual localization has the potential to reduce cost and improve mapping quality in optical benthic monitoring with autonomous underwater vehicles (AUVs). Despite this potential, long-term visual localization in benthic environments remains understudied, primarily due to the lack of curated datasets for benchmarking. Moreover, limited georeferencing accuracy and image footprints necessitate precise geometric information for accurate ground-truthing. In this work, we address these gaps by presenting a curated dataset for long-term visual localization in benthic environments and a novel method to ground-truth visual localization results for near-nadir underwater imagery. Our dataset comprises georeferenced AUV imagery from five benthic reference sites, revisited over periods up to six years, and includes raw and color-corrected stereo imagery, camera calibrations, and sub-decimeter registered camera poses. To our knowledge, this is the first curated underwater dataset for long-term visual localization spanning multiple sites and photic-zone habitats. Our ground-truthing method estimates 3D seafloor image footprints and links camera views with overlapping footprints, ensuring that ground-truth links reflect shared visual content. Building on this dataset and ground truth, we benchmark eight state-of-the-art visual place recognition (VPR) methods and find that Recall@K is significantly lower on our dataset than on established terrestrial and underwater benchmarks. Finally, we compare our footprint-based ground truth to a traditional location-based ground truth and show that distance-threshold ground-truthing can overestimate VPR Recall@K at sites with rugged terrain and altitude variations. Together, the curated dataset, ground-truthing method, and VPR benchmark provide a stepping stone for advancing long-term visual localization in dynamic benthic environments.

Computer Vision Data Curation & Synthetic Data Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References59

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Long-Term Visual Localization in Dynamic Benthic Environments: A Dataset, Footprint-Based Ground Truth, and Visual Place Recognition Benchmark

Related Papers