Search papers, labs, and topics across Lattice.
The paper introduces CORE, a new million-scale dataset for cross-modal geo-localization (CMGL) comprising paired ground-level text descriptions and geo-tagged aerial imagery from 225 regions across all continents. To generate high-quality text descriptions, the authors leverage zero-shot reasoning from Large Vision-Language Models (LVLMs). They also propose a physical-law-aware network (PLANET) that uses a contrastive learning paradigm to guide textual representations in capturing physical signatures from satellite imagery, achieving state-of-the-art performance on the CORE dataset.
A million-scale dataset of globally diverse, cross-modal geo-location pairs, coupled with a novel physical-law-aware network, leapfrogs existing CMGL benchmarks and opens the door to truly universal positioning systems.
Cross-modal Geo-localization (CMGL) matches ground-level text descriptions with geo-tagged aerial imagery, which is crucial for pedestrian navigation and emergency response. However, existing researches are constrained by narrow geographic coverage and simplistic scene diversity, failing to reflect the immense spatial heterogeneity of global architectural styles and topographic features. To bridge this gap and facilitate universal positioning, we introduce CORE, the first million-scale dataset dedicated to global CMGL. CORE comprises 1,034,786 cross-view images sampled from 225 distinct geographic regions across all continents, offering an unprecedented variety of perspectives in varying environmental conditions and urban layouts. We leverage the zero-shot reasoning of Large Vision-Language Models (LVLMs) to synthesize high-quality scene descriptions rich in discriminative cues. Furthermore, we propose a physical-law-aware network (PLANET) for cross-modal geo-localization. PLANET introduces a novel contrastive learning paradigm to guide textual representations in capturing the intrinsic physical signatures of satellite imagery. Extensive experiments across varied geographic regions demonstrate that PLANet significantly outperforms state-of-the-art methods, establishing a new benchmark for robust, global-scale geo-localization. The dataset and source code will be released at https://github.com/YtH0823/CORE.