Mar 3, 2026arXiv:2603.02726

Cross-view geo-localization, Image retrieval, Multiscale geometric modeling, Frequency domain enhancement

AI Summary

This paper addresses cross-view geo-localization (CVGL) by proposing a Spatial and Frequency Domain Enhancement Network (SFDE) to handle geometric asymmetry and texture inconsistency. SFDE uses a three-branch architecture to model global semantic context, local geometric structure, and frequency domain stability, capturing cross-domain consistency from multiple perspectives. Experiments demonstrate that SFDE achieves competitive or superior performance compared to state-of-the-art CVGL methods, while remaining computationally efficient.

Key Contribution

Achieve state-of-the-art cross-view geo-localization by explicitly modeling frequency domain characteristics alongside spatial features, offering robustness to viewpoint changes and texture variations.

Abstract

Cross-view geo-localization (CVGL) aims to establish spatial correspondences between images captured from significantly different viewpoints and constitutes a fundamental technique for visual localization in GNSS-denied environments. Nevertheless, CVGL remains challenging due to severe geometric asymmetry, texture inconsistency across imaging domains, and the progressive degradation of discriminative local information. Existing methods predominantly rely on spatial domain feature alignment, which is inherently sensitive to large scale viewpoint variations and local disturbances. To alleviate these limitations, this paper proposes the Spatial and Frequency Domain Enhancement Network (SFDE), which leverages complementary representations from spatial and frequency domains. SFDE adopts a three branch parallel architecture to model global semantic context, local geometric structure, and statistical stability in the frequency domain, respectively, thereby characterizing consistency across domains from the perspectives of scene topology, multiscale structural patterns, and frequency invariance. The resulting complementary features are jointly optimized in a unified embedding space via progressive enhancement and coupled constraints, enabling the learning of cross-view representations with consistency across multiple granularities. Comprehensive experiments show that SFDE achieves competitive performance and in many cases even surpasses state-of-the-art methods, while maintaining a lightweight and computationally efficient design. {Our code is available at https://github.com/Mashuaishuai669/SFDE

Computer Vision Multimodal Models Recommendation & Information Retrieval

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Cross-view geo-localization, Image retrieval, Multiscale geometric modeling, Frequency domain enhancement

Related Papers