Search papers, labs, and topics across Lattice.
This paper introduces SeSAM, a framework that adapts the Segment Anything Model (SAM) for weakly supervised semantic segmentation by addressing challenges in using instance-based SAM for class-based segmentation. SeSAM decomposes class masks, samples point prompts along object skeletons, selects SAM masks using weak-label coverage, and iteratively refines labels using pseudo-labels. Experiments demonstrate that SeSAM outperforms weakly supervised baselines across multiple benchmarks and weak annotation types, significantly reducing annotation costs.
SAM, designed for instance segmentation, can be surprisingly effective for semantic segmentation with weak supervision when adapted with techniques like skeleton-based prompting and iterative pseudo-label refinement.
Semantic segmentation requires dense pixel-level annotations, which are costly and time-consuming to acquire. To address this, we present SeSAM, a framework that uses a foundational segmentation model, i.e. Segment Anything Model (SAM), with weak labels, including coarse masks, scribbles, and points. SAM, originally designed for instance-based segmentation, cannot be directly used for semantic segmentation tasks. In this work, we identify specific challenges faced by SAM and determine appropriate components to adapt it for class-based segmentation using weak labels. Specifically, SeSAM decomposes class masks into connected components, samples point prompts along object skeletons, selects SAM masks using weak-label coverage, and iteratively refines labels using pseudo-labels, enabling SAM-generated masks to be effectively used for semantic segmentation. Integrated with a semi-supervised learning framework, SeSAM balances ground-truth labels, SAM-based pseudo-labels, and high-confidence pseudo-labels, significantly improving segmentation quality. Extensive experiments across multiple benchmarks and weak annotation types show that SeSAM consistently outperforms weakly supervised baselines while substantially reducing annotation cost relative to fine supervision.