College of Computer Science and Software EngineeringWHUXi’an University of Electronic ScienceApr 13, 2026arXiv:2604.11415

Observe Less, Understand More: Cost-aware Cross-scale Observation for Remote Sensing Understanding

Zhenghao Xie, Jing Xiao, Zhenqi Wang, Kexin Ma, Liang Liao, Mi Wang

AI Summary

The paper addresses the problem of cost-aware remote sensing understanding, where high-resolution (HR) imagery is expensive to acquire. They propose a unified framework that couples fine-grained HR sampling with cross-patch representation prediction, allowing for effective task reasoning with fewer HR observations. The authors also introduce GL-10M, a large-scale benchmark dataset of 10 million spatially aligned multi-resolution images for evaluating budget-constrained cross-scale reasoning.

Key Contribution

Achieve superior remote sensing performance at a fraction of the cost by intelligently sampling high-resolution imagery based on global context and fine-grained patch importance.

Abstract

Remote sensing understanding inherently requires multi-resolution observation, since different targets and application tasks demand different levels of spatial detail. While low-resolution (LR) imagery enables efficient global observation, high-resolution (HR) imagery provides critical local details at much higher acquisition cost and limited coverage. This motivates a cross-scale sensing strategy that selectively acquires HR imagery from LR-based global perception to improve task performance under constrained cost. Existing methods for HR sampling methods typically make selection decisions from isolated LR patches, which ignore fine-grained intra-patch importance and cross-patch contextual interactions, leading to fragmented feature representation and suboptimal scene reasoning under sparse HR observations. To address this issue, we formulate cross-scale remote sensing understanding as a unified cost-aware problem that couples fine-grained HR sampling with cross-patch representation prediction, enabling more effective task reasoning with fewer HR observations. Furthermore, we present GL-10M, a large-scale benchmark of 10 million spatially aligned multi-resolution images, enabling systematic evaluation of budget-constrained cross-scale reasoning in remote sensing. Extensive experiments on recognition and retrieval tasks show that our method consistently achieves a superior performance-cost trade-off.

Computer Vision Data Curation & Synthetic Data Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Observe Less, Understand More: Cost-aware Cross-scale Observation for Remote Sensing Understanding

Related Papers