Search papers, labs, and topics across Lattice.
This paper introduces GeoSearch, a framework for worldwide image geolocalization that enhances retrieval-augmented generation (RAG) pipelines with web-scale reverse image search. GeoSearch augments LMM prompts with both database-retrieved coordinates and textual evidence extracted from web pages, improving geolocalization accuracy. A two-layer filtering mechanism, combining image matching and confidence-based gating, is used to reduce noise from irrelevant web content.
Web-scale reverse image search, combined with a clever filtering mechanism, significantly boosts the accuracy of image geolocalization, even when reference databases lack relevant scenes.
Worldwide image geolocalization, which aims to predict the GPS coordinates of any image on Earth, remains challenging due to global visual diversity. Recent generative approaches based on Retrieval-Augmented Generation (RAG) and Large Multimodal Models (LMMs) leverage candidates retrieved from fixed databases for reasoning, but often struggle with scenes that are absent from the reference set. In this work, we propose GeoSearch, an open-world geolocation framework that integrates web-scale reverse image search into the RAG pipeline. GeoSearch augments LMM prompts with database-retrieved coordinates and textual evidence extracted from web pages. To mitigate noise from irrelevant content, we introduce a two-layer filtering mechanism consisting of image matching, followed by confidence-based gating. Experiments on standard benchmarks Im2GPS3k and YFCC4k demonstrate the superiority of GeoSearch under leakage-aware evaluation. Our code and data are publicly available to support reproducibility.