Hainan UniversitySDUOct 27, 2025

FSCDiff: Frequency-Spatial Entangled Conditional Diffusion model for Underwater Salient Object Detection

Hua Li, Gaowei Lin, Zhiyuan Li, S. Kwong, Runmin Cong

AI Summary

The paper introduces FSCDiff, a novel Fourier-Spatial Entangled Conditional Diffusion model, to improve the accuracy and robustness of underwater salient object detection (USOD). FSCDiff addresses limitations of existing spatial-domain methods by incorporating Fourier-domain information and leveraging the iterative generation capabilities of diffusion models to handle insufficient representation and boundary shift issues. Experiments on USOD10K and USOD datasets demonstrate that FSCDiff outperforms state-of-the-art USOD methods.

Key Contribution

By cleverly fusing Fourier and spatial domain information within a diffusion framework, FSCDiff significantly boosts the accuracy of underwater salient object detection, outperforming existing RGB-D methods.

Abstract

Salient object detection (SOD) plays a crucial role in image understanding and visual guidance. However, due to the complexity of underwater environments, the accuracy of underwater salient object detection is often low. To improve the accuracy and robustness of underwater salient object detection, different from the existing spatial domain aware RGB-D methods that rely on pixel-level probabilities, we propose a novel Fourier-Spatial Entangled Conditional Diffusion model (FSCDiff) for underwater salient object detection. The FSCDiff aims to address the insufficient representation and boundary shift issues in underwater salient object detection by leveraging Fourier-domain information and the powerful multi-step iterative generation capability of diffusion models. The FSCDiff framework consists of two key components: the Dual-Domain Entanglement Enhancement Block (DTEB) and the Stable Time-step Mask Prediction Module (STMP). DTEB utilizes Fourier-spatial entanglement learning to fully exploit the Fourier and spatial domain information of RGB images and depth maps, thereby optimizing feature representation. STMP takes advantage of the excellent multi-step iterative mechanism of diffusion models to enhance the accuracy and robustness of the segmentation results. Comprehensive experimental results indicate that our FSCDiff method outperforms the state-of-the-art approaches on the USOD10K and USOD datasets. The source code is available at: https://github.com/lgwplay/FSCDiff.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

Citation Metrics

Citations1

Influential citations0

References50

Year2025

VenueACM Multimedia

Related Papers

Finding related papers...

Search

FSCDiff: Frequency-Spatial Entangled Conditional Diffusion model for Underwater Salient Object Detection

Related Papers