Search papers, labs, and topics across Lattice.
This paper introduces HE-VPR, a visual place recognition (VPR) framework for aerial images that leverages height estimation to improve performance against scale variance. The system uses a frozen DINOv2 backbone with two lightweight adapter branches: one for height estimation via retrieval from a compact height database, and another for VPR within a height-specific sub-database. Experiments on multi-altitude datasets show HE-VPR achieves up to 6.1% Recall@1 improvement over ViT-based baselines while reducing memory usage by up to 90%.
Achieve state-of-the-art aerial visual place recognition by decoupling height inference and place recognition with lightweight adapters on a frozen DINOv2 backbone, significantly reducing memory usage and improving recall.
In this work, we propose HE-VPR, a visual place recognition (VPR) framework that incorporates height estimation. Our system decouples height inference from place recognition, allowing both modules to share a frozen DINOv2 backbone. Two lightweight bypass adapter branches are integrated into our system. The first estimates the height partition of the query image via retrieval from a compact height database, and the second performs VPR within the corresponding height-specific sub-database. The adaptation design reduces training cost and significantly decreases the search space of the database. We also adopt a center-weighted masking strategy to further enhance the robustness against scale differences. Experiments on two self-collected challenging multi-altitude datasets demonstrate that HE-VPR achieves up to 6.1\% Recall@1 improvement over state-of-the-art ViT-based baselines and reduces memory usage by up to 90\%. These results indicate that HE-VPR offers a scalable and efficient solution for height-aware aerial VPR, enabling practical deployment in GNSS-denied environments. All the code and datasets for this work have been released on https://github.com/hmf21/HE-VPR.