Search papers, labs, and topics across Lattice.
This paper introduces SemDINO, an innovative end-to-end network designed for semantic change detection (SCD) that addresses key challenges such as cross-temporal alignment and robustness to pseudo-changes. By leveraging a dual-branch encoder that fuses CNN and frozen DINOv3 features, along with a multi-scale temporal bidirectional transformer interaction module, SemDINO enhances semantic representation and change detection accuracy. Experimental results on public remote sensing datasets show that SemDINO significantly outperforms existing methods, particularly in complex environments with various interference factors.
SemDINO achieves unprecedented robustness in semantic change detection, outperforming state-of-the-art methods even in challenging conditions with significant noise.
Semantic change detection (SCD) aims to simultaneously locate land-cover changes and identify semantic categories before and after transition. However, existing methods suffer from insufficient cross-temporal alignment, weak multi-scale representation, and poor robustness to pseudo-changes caused by illumination, season, and registration noise. To address these issues, we propose a novel end-to-end semantic change detection network named SemDINO, which integrates a dual-branch encoder, multi-scale temporal interaction, semantic purification, change enhancement, and decoupled multi-task prediction into a unified framework. Specifically, we construct a dual-branch encoder that combines a CNN backbone and frozen DINOv3 features via gated pyramid fusion, enabling rich multi-scale semantic representation. Then, a multi-scale temporal bidirectional transformer interaction (M-TBTT) module is proposed to achieve global cross-temporal feature alignment and information interaction. To further enhance genuine changes and suppress pseudo-variations, we introduce semantic purification (SCP), bidirectional change enhancement (BiChangeEnhance), and multi-scale change enhancement (MCE) modules collaboratively. Finally, a multi-branch CD prediction head is designed to jointly output binary change mask, bi-temporal semantic maps, and edge constraint. Extensive experiments on public remote sensing CD datasets demonstrate that SemDINO achieves superior performance and generalization ability against state-of-the-art methods, especially in complex scenarios with interference factors.