Apr 7, 2026arXiv:2604.05433

Few-Shot Semantic Segmentation Meets SAM3

AI Summary

This paper explores using a frozen Segment Anything Model 3 (SAM3) for few-shot semantic segmentation (FSS) by leveraging its Promptable Concept Segmentation (PCS) capability. They spatially concatenate support and query images into a shared canvas, enabling SAM3 to perform segmentation without fine-tuning. Experiments on PASCAL-$5^i$ and COCO-$20^i$ demonstrate state-of-the-art FSS performance, and the authors find that negative prompts are surprisingly counterproductive in this setting.

Key Contribution

Freezing SAM3 and simply concatenating images spatially achieves state-of-the-art few-shot segmentation, revealing that strong cross-image reasoning can emerge from surprisingly simple formulations.

Abstract

Few-Shot Semantic Segmentation (FSS) focuses on segmenting novel object categories from only a handful of annotated examples. Most existing approaches rely on extensive episodic training to learn transferable representations, which is both computationally demanding and sensitive to distribution shifts. In this work, we revisit FSS from the perspective of modern vision foundation models and explore the potential of Segment Anything Model 3 (SAM3) as a training-free solution. By repurposing its Promptable Concept Segmentation (PCS) capability, we adopt a simple spatial concatenation strategy that places support and query images into a shared canvas, allowing a fully frozen SAM3 to perform segmentation without any fine-tuning or architectural changes. Experiments on PASCAL-$5^i$ and COCO-$20^i$ show that this minimal design already achieves state-of-the-art performance, outperforming many heavily engineered methods. Beyond empirical gains, we uncover that negative prompts can be counterproductive in few-shot settings, where they often weaken target representations and lead to prediction collapse despite their intended role in suppressing distractors. These findings suggest that strong cross-image reasoning can emerge from simple spatial formulations, while also highlighting limitations in how current foundation models handle conflicting prompt signals. Code at: https://github.com/WongKinYiu/FSS-SAM3

Computer Vision Multimodal Models Open-Source Models & Weights

Citation Metrics

Citations0

Influential citations0

References24

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Few-Shot Semantic Segmentation Meets SAM3

Related Papers