Search papers, labs, and topics across Lattice.
3
7
4
31
This work presents EarthMind, a novel vision-language framework for multi-granular and multi-sensor EO data understanding and outperforms existing methods on multiple public EO benchmarks, showcasing its potential to handle both multi-granular and multi-sensor challenges in a unified framework.
Forget expensive labeled data: this VLM learns to "read" OpenStreetMap data to caption satellite images, achieving state-of-the-art remote sensing performance at a fraction of the cost.
EarthMind demonstrates that hierarchical cross-modal attention across optical and SAR data significantly boosts MLLM performance on Earth Observation tasks, outperforming models limited to single-sensor inputs.