Tsinghua AIOct 31, 2025

DLDC: A Dual Loop Data Cleaning Method for Fine-Tuning Remote Sensing Image Generative Models

Tian Xing, Hu Yan, Xinwei Wang, Kailai Sun, Han Yu, Pinjie Li, Qianchuan Zhao

AI Summary

The paper introduces a Dual Loop Data Cleaning (DLDC) method to automatically generate high-quality remote sensing image-text training data by leveraging contrastive multimodal quality evaluations. DLDC uses an external generation loop (EGL) based on a multimodal foundational model for layout description and an internal evaluation loop (IEL) based on contrastive learning metrics to assess image-text matching. Fine-tuning T2I models with the cleaned dataset results in significant improvements in image generation quality, as evidenced by substantial reductions in FID and increases in CLIP and RemoteCLIP scores, and improved downstream segmentation performance.

Key Contribution

Forget expensive human annotation: this dual-loop method automatically cleans remote sensing image-text datasets, boosting T2I model performance by over 35%.

Abstract

Text-to-image (T2I) generation, offering flexible and intuitive synthetic data for downstream geoscience applications, has garnered increasing attention in recent years. Training a good T2I model often requires high-quality, large-scale image–text datasets. However, obtaining these datasets in remote sensing (RS) is challenging because of high annotation costs and specific domain knowledge. This study proposes a dual loop data cleaning (DLDC) method, which leverages contrastive multimodal quality evaluations to generate high-quality RS image–text training data automatically. By constructing an external generation loop (EGL) based on a multimodal foundational model and an internal evaluation loop (IEL) based on contrastive learning metrics, DLDC can automatically generate layout description and evaluate image–text matching degree on satellite images. The proposed approach effectively filters out noisy samples and curates a refined dataset without human intervention. Experimental results show that our dual loop evaluation can accurately determine the optimal data cleaning ratio for different scenes, improving image generation quality. Compared with the pretrained T2I models, our fine-tuned models reduce Fréchet Inception Distance values by over 35%, increase CLIP scores by more than 25%, and improve RemoteCLIP scores by over 10.5%. Furthermore, our DLDC method can achieve superior performance compared to other state-of-the-art RS T2I models (e.g., Crs-diff, GeoRSSD, DiffusionSAT). Our data-cleaning method can improve downstream segmentation tasks, resulting in 8.14% in mean IoU and 7.5% in mean accuracy compared to the same model trained on raw or uncleaned data. Experimental results demonstrate that our automatically generated image–text data is of a similar quality to human manually annotated data, opening new pathways for rapid, cost-effective, and reliable RS data generation.

Computer Vision Data Curation & Synthetic Data Multimodal Models

Citation Metrics

Citations0

Influential citations0

References105

Year2025

VenueIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

Related Papers

Finding related papers...

Search

DLDC: A Dual Loop Data Cleaning Method for Fine-Tuning Remote Sensing Image Generative Models

Related Papers