Search papers, labs, and topics across Lattice.
The paper introduces Delta-LLaVA, a novel multimodal large language model (MLLM) framework designed for multi-temporal remote sensing change understanding, addressing the "temporal blindness" of existing MLLMs. Delta-LLaVA incorporates Change-Enhanced Attention, Change-SEG with Change Prior Embedding, and Local Causal Attention to isolate visual differences, extract differentiable change features, and prevent cross-temporal leakage. Experiments on the new Delta-QA benchmark, comprising 180k visual question-answering samples, show that Delta-LLaVA outperforms generalist MLLMs and specialized segmentation models in change deduction and boundary localization.
MLLMs can now see time: Delta-LLaVA unlocks precise change detection in remote sensing by explicitly modeling temporal relationships, outperforming existing models on a new challenging benchmark.
While Multimodal Large Language Models (MLLMs) excel in general vision-language tasks, their application to remote sensing change understanding is hindered by a fundamental "temporal blindness". Existing architectures lack intrinsic mechanisms for multi-temporal contrastive reasoning and struggle with precise spatial grounding. To address this, we first introduce Delta-QA, a comprehensive benchmark comprising 180k visual question-answering samples. Delta-QA unifies pixel-level segmentation and visual question answering across bi- and tri-temporal scenarios, structuring change interpretation into four progressive cognitive dimensions. Methodologically, we propose Delta-LLaVA, a novel MLLM framework explicitly tailored for multi-temporal remote sensing interpretation. It overcomes the limitations of naive feature concatenation through three core innovations: a Change-Enhanced Attention module that systematically isolates and amplifies visual differences, a Change-SEG module utilizing Change Prior Embedding to extract differentiable difference features as input for the LLM, and Local Causal Attention to prevent cross-temporal contextual leakage. Extensive experiments demonstrate that Delta-LLaVA decisively outperforms leading generalist MLLMs and specialized segmentation models in complex change deduction and high-precision boundary localization, establishing a unified framework for earth observation intelligence.