Feb 23, 2026arXiv:2602.19944

Discover, Segment, and Select: A Progressive Mechanism for Zero-shot Camouflaged Object Segmentation

Yilong Yang, Jianxin Tian, Shengchuan Zhang, Liujuan Cao

AI Summary

This paper introduces Discover-Segment-Select (DSS), a progressive mechanism for zero-shot camouflaged object segmentation that addresses the limitations of relying solely on MLLMs for object discovery. DSS employs a Feature-coherent Object Discovery (FOD) module to generate diverse object proposals, refines these proposals using SAM segmentation, and then uses a Semantic-driven Mask Selection (SMS) module with MLLMs to select the best mask. Experiments demonstrate that DSS achieves state-of-the-art performance on multiple camouflaged object segmentation benchmarks, particularly in scenes with multiple instances.

Key Contribution

Achieve state-of-the-art zero-shot camouflaged object segmentation by intelligently combining visual features, SAM, and MLLMs to overcome the limitations of relying solely on MLLMs for object discovery.

Abstract

Current zero-shot Camouflaged Object Segmentation methods typically employ a two-stage pipeline (discover-then-segment): using MLLMs to obtain visual prompts, followed by SAM segmentation. However, relying solely on MLLMs for camouflaged object discovery often leads to inaccurate localization, false positives, and missed detections. To address these issues, we propose the \textbf{D}iscover-\textbf{S}egment-\textbf{S}elect (\textbf{DSS}) mechanism, a progressive framework designed to refine segmentation step by step. The proposed method contains a Feature-coherent Object Discovery (FOD) module that leverages visual features to generate diverse object proposals, a segmentation module that refines these proposals through SAM segmentation, and a Semantic-driven Mask Selection (SMS) module that employs MLLMs to evaluate and select the optimal segmentation mask from multiple candidates. Without requiring any training or supervision, DSS achieves state-of-the-art performance on multiple COS benchmarks, especially in multiple-instance scenes.

Computer Vision Multimodal Models Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Discover, Segment, and Select: A Progressive Mechanism for Zero-shot Camouflaged Object Segmentation

Related Papers