Search papers, labs, and topics across Lattice.
The paper addresses the problem of cost-aware remote sensing understanding, where high-resolution (HR) imagery is expensive to acquire. They propose a unified framework that couples fine-grained HR sampling with cross-patch representation prediction, allowing for effective task reasoning with fewer HR observations. The authors also introduce GL-10M, a large-scale benchmark dataset of 10 million spatially aligned multi-resolution images for evaluating budget-constrained cross-scale reasoning.
Achieve superior remote sensing performance at a fraction of the cost by intelligently sampling high-resolution imagery based on global context and fine-grained patch importance.
Remote sensing understanding inherently requires multi-resolution observation, since different targets and application tasks demand different levels of spatial detail. While low-resolution (LR) imagery enables efficient global observation, high-resolution (HR) imagery provides critical local details at much higher acquisition cost and limited coverage. This motivates a cross-scale sensing strategy that selectively acquires HR imagery from LR-based global perception to improve task performance under constrained cost. Existing methods for HR sampling methods typically make selection decisions from isolated LR patches, which ignore fine-grained intra-patch importance and cross-patch contextual interactions, leading to fragmented feature representation and suboptimal scene reasoning under sparse HR observations. To address this issue, we formulate cross-scale remote sensing understanding as a unified cost-aware problem that couples fine-grained HR sampling with cross-patch representation prediction, enabling more effective task reasoning with fewer HR observations. Furthermore, we present GL-10M, a large-scale benchmark of 10 million spatially aligned multi-resolution images, enabling systematic evaluation of budget-constrained cross-scale reasoning in remote sensing. Extensive experiments on recognition and retrieval tasks show that our method consistently achieves a superior performance-cost trade-off.