BeihangNTUMar 19, 2026arXiv:2603.19077

Multi-Modal Building Change Detection for Large-Scale Small Changes: Benchmark and Baseline

Ye Wang, Wei Lu, Zhi-Hui You, Keyan Chen, Tongfei Liu, Kaiyu Li, Hongruixuan Chen, Qing-Ling Shu, Sibao Chen

AI Summary

This paper introduces LSMD, a new large-scale, high-resolution RGB-NIR building change detection dataset focused on detecting small changes in complex, real-world scenarios. To effectively leverage the multi-modal data, they propose MSCNet, a novel network architecture incorporating neighborhood context enhancement, cross-modal alignment and interaction, and saliency-aware refinement modules. Experiments on LSMD demonstrate that MSCNet outperforms existing methods in fine-grained building change detection by effectively fusing RGB and NIR information.

Key Contribution

Detecting subtle building changes gets a boost: a new RGB-NIR dataset and network reveal the power of multi-modal fusion for teasing out fine-grained differences.

Abstract

Change detection in optical remote sensing imagery is susceptible to illumination fluctuations, seasonal changes, and variations in surface land-cover materials. Relying solely on RGB imagery often produces pseudo-changes and leads to semantic ambiguity in features. Incorporating near-infrared (NIR) information provides heterogeneous physical cues that are complementary to visible light, thereby enhancing the discriminability of building materials and tiny structures while improving detection accuracy. However, existing multi-modal datasets generally lack high-resolution and accurately registered bi-temporal imagery, and current methods often fail to fully exploit the inherent heterogeneity between these modalities. To address these issues, we introduce the Large-scale Small-change Multi-modal Dataset (LSMD), a bi-temporal RGB-NIR building change detection benchmark dataset targeting small changes in realistic scenarios, providing a rigorous testing platform for evaluating multi-modal change detection methods in complex environments. Based on LSMD, we further propose the Multi-modal Spectral Complementarity Network (MSCNet) to achieve effective cross-modal feature fusion. MSCNet comprises three key components: the Neighborhood Context Enhancement Module (NCEM) to strengthen local spatial details, the Cross-modal Alignment and Interaction Module (CAIM) to enable deep interaction between RGB and NIR features, and the Saliency-aware Multisource Refinement Module (SMRM) to progressively refine fused features. Extensive experiments demonstrate that MSCNet effectively leverages multi-modal information and consistently outperforms existing methods under multiple input configurations, validating its efficacy for fine-grained building change detection. The source code will be made publicly available at: https://github.com/AeroVILab-AHU/LSMD

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

Citation Metrics

Citations0

Influential citations0

References46

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Multi-Modal Building Change Detection for Large-Scale Small Changes: Benchmark and Baseline

Related Papers