School of Information and Communication EngineeringDec 15, 2025

DiffORSINet: Salient Object Detection in Optical Remote Sensing Images via Conditional Diffusion Model

AI Summary

This paper introduces DiffORSINet, a diffusion model-based network for salient object detection in optical remote sensing images (ORSI-SOD) that addresses challenges related to ambiguous boundaries and complex backgrounds. The method formulates ORSI-SOD as a conditional mask generation problem, leveraging RGB images and time step guidance to iteratively refine salient object segmentation during denoising. A dedicated denoising network incorporating a Fourier frequency awareness module (FFAM) and a multilevel feature fusion module (MFFM) is proposed to enhance refinement capabilities and reduce background interference.

Key Contribution

Diffusion models can now accurately segment salient objects in complex remote sensing imagery, outperforming existing methods by leveraging frequency-domain feature enhancement and multi-level feature fusion.

Abstract

Salient Object Detection in optical remote sensing images (ORSI-SOD) has received increasing attention in recent years. Although some progress has been made in existing methods, there are still challenges such as ambiguous and irregular boundaries of salient targets and complex backgrounds. The existing ORSI-SOD methods have difficulty in finely dividing the boundaries of salient targets and dealing with chaotic backgrounds. To solve these problems, we propose a new network based on the diffusion model, termed DiffORSINet, which describes the ORSI-SOD task as a conditional mask generation problem. By combining RGB images and the guidance of time steps, it can gradually and accurately locate and refine the segmentation of salient targets during the denoising process. Furthermore, we design a dedicated denoising network, which includes a Fourier frequency awareness module (FFAM) and a multilevel feature fusion module (MFFM), which significantly improves the refinement ability of the network. FFAM captures and fuses the frequency-domain features by combining the Fourier transform operation and the cross-attention (CA) mechanism, enhances the intensity of some signals, and thereby refines the image details. MFFM reduces the interference of chaotic backgrounds by coordinating and fusing multilevel features and suppressing irrelevant regions. Finally, the comparative experimental results on three widely used ORSI-SOD datasets show that the method proposed in this article is superior to other existing methods. Our code and results are available at https://github.com/hyy-qd/DiffORSINet/

Architecture Design (Transformers, SSMs, MoE)Computer Vision

Citation Metrics

Citations0

Influential citations0

References62

Year2025

VenueIEEE Transactions on Geoscience and Remote Sensing

Related Papers

Finding related papers...

Search

DiffORSINet: Salient Object Detection in Optical Remote Sensing Images via Conditional Diffusion Model

Related Papers