Search papers, labs, and topics across Lattice.
6
7
7
29
This work presents EarthMind, a novel vision-language framework for multi-granular and multi-sensor EO data understanding and outperforms existing methods on multiple public EO benchmarks, showcasing its potential to handle both multi-granular and multi-sensor challenges in a unified framework.
Image-size agnostic vision transformers are now a practical reality, thanks to a new self-supervised pretraining method that maintains constant computational cost regardless of input resolution.
Removing an object from a video isn't just about inpainting what's behind it; VOID ensures the *downstream physics* still make sense.
Autonomous vehicles can now better identify the unexpected, thanks to a new method that boosts out-of-distribution detection by up to 20% without retraining.
Forget expensive teacher models and manual labeling: a base VLM paired with OpenStreetMap data can annotate itself for remote sensing tasks, achieving state-of-the-art performance at a fraction of the cost.
EarthMind demonstrates that hierarchical cross-modal attention across optical and SAR data significantly boosts MLLM performance on Earth Observation tasks, outperforming models limited to single-sensor inputs.