MelbourneMar 10, 2026arXiv:2603.09377

SinGeo: Unlock Single Model's Potential for Robust Cross-View Geo-Localization

AI Summary

The paper introduces SinGeo, a framework for cross-view geo-localization that enables a single model to achieve robustness across diverse field-of-view (FoV) conditions without additional modules. SinGeo uses a dual discriminative learning architecture to improve intra-view discriminability and incorporates a curriculum learning strategy to handle varying FoV difficulties. Experiments on four benchmark datasets demonstrate that SinGeo achieves state-of-the-art results and exhibits cross-architecture transferability, along with a proposed consistency evaluation method for assessing model stability.

Key Contribution

Forget training separate models for different field-of-views in geo-localization — SinGeo achieves SOTA robustness with a single model, even outperforming specialized architectures.

Abstract

Robust cross-view geo-localization (CVGL) remains challenging despite the surge in recent progress. Existing methods still rely on field-of-view (FoV)-specific training paradigms, where models are optimized under a fixed FoV but collapse when tested on unseen FoVs and unknown orientations. This limitation necessitates deploying multiple models to cover diverse variations. Although studies have explored dynamic FoV training by simply randomizing FoVs, they failed to achieve robustness across diverse conditions -- implicitly assuming all FoVs are equally difficult. To address this gap, we present SinGeo, a simple yet powerful framework that enables a single model to realize robust cross-view geo-localization without additional modules or explicit transformations. SinGeo employs a dual discriminative learning architecture that enhances intra-view discriminability within both ground and satellite branches, and is the first to introduce a curriculum learning strategy to achieve robust CVGL. Extensive evaluations on four benchmark datasets reveal that SinGeo sets state-of-the-art (SOTA) results under diverse conditions, and notably outperforms methods specifically trained for extreme FoVs. Beyond superior performance, SinGeo also exhibits cross-architecture transferability. Furthermore, we propose a consistency evaluation method to quantitatively assess model stability under varying views, providing an explainable perspective for understanding and advancing robustness in future CVGL research. Codes will be available upon acceptance.

Computer Vision Multimodal Models Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References33

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SinGeo: Unlock Single Model's Potential for Robust Cross-View Geo-Localization

Related Papers