Search papers, labs, and topics across Lattice.
5
7
6
31
This work presents EarthMind, a novel vision-language framework for multi-granular and multi-sensor EO data understanding and outperforms existing methods on multiple public EO benchmarks, showcasing its potential to handle both multi-granular and multi-sensor challenges in a unified framework.
Image-size agnostic vision transformers are now a practical reality, thanks to a new self-supervised pretraining method that maintains constant computational cost regardless of input resolution.
Autonomous vehicles can now better identify the unexpected, thanks to a new method that boosts out-of-distribution detection by up to 20% without retraining.
Forget expensive teacher models and manual labeling: a base VLM paired with OpenStreetMap data can annotate itself for remote sensing tasks, achieving state-of-the-art performance at a fraction of the cost.
EarthMind demonstrates that hierarchical cross-modal attention across optical and SAR data significantly boosts MLLM performance on Earth Observation tasks, outperforming models limited to single-sensor inputs.