Search papers, labs, and topics across Lattice.
This paper introduces SST-CD, a spatially selective self-training framework designed for unsupervised building change detection from bi-temporal remote sensing images. By leveraging temporal discrepancies as candidate pseudo labels while filtering out unreliable spatial pixels, SST-CD effectively mitigates the noise associated with generic temporal differences, leading to improved task-specific detection performance. Experimental results demonstrate that SST-CD achieves superior F1 scores compared to existing unsupervised and label-free methods across multiple datasets, highlighting its efficacy in addressing the challenges of building change detection.
SST-CD filters out noise in temporal discrepancies to achieve state-of-the-art performance in unsupervised building change detection.
Unsupervised building change detection aims to learn building-change masks from unlabeled bi-temporal remote sensing images. Existing label-free methods often follow a discrepancy-to-mask paradigm, directly using temporal differences, frozen foundation-model responses, prompt-based outputs, or post-processing results as final change maps. Although these strategies provide annotation-free cues, they do not learn a task-specific building-change detector and remain vulnerable to the gap between generic temporal discrepancies and building-defined structural changes. In practice, such discrepancies are often noisy and task-irrelevant, as appearance shifts, registration errors, and non-building modifications can produce strong but misleading responses. To address this problem, we propose SST-CD, a spatially selective self-training framework that reformulates fully label-free building change detection as end-to-end detector learning under noisy pseudo supervision. SST-CD uses temporal discrepancies as candidate pseudo labels and trains the detector only on spatially reliable pixels, whose reliability is estimated by a local consistency criterion that filters inconsistent regions from supervision. To further stabilize noisy self-training, a lightweight feature adapter recalibrates bi-temporal features, while a prototype-based decoder produces compact change and no-change representations. Experiments on LEVIR-CD, WHU-CD, and DSIFN-CD show that SST-CD achieves F1 scores of 83.08\%, 91.69\%, and 86.60\%, respectively, outperforming existing unsupervised and label-free baselines. Code will be made publicly available.