Zhejiang Uni- versityMay 27, 2026arXiv:2605.28239

Learning to Label: A Reinforced Self-Evolving Framework for Semi-supervised Referring Expression Segmentation

Runlong Cao, Ying Zang, Chuanwei Zhou, Tianrun Chen, Tong Zhang, Zhen Cui, Chunyan Xu

AI Summary

This paper introduces Learning to Label (L2L), a reinforced self-evolving framework for semi-supervised referring expression segmentation (SS-RES). L2L leverages a multimodal large language model to extract semantic-spatial priors, which are then used as guidance signals for a hierarchical segmentation network. A reinforcement learning approach adaptively rewards high-utility pixel-level pseudo-labels based on multimodal priors and model predictions, enabling joint optimization and improved label reliability.

Key Contribution

Stop hand-crafting pseudo-labels: this framework learns to generate and select them for semi-supervised segmentation, boosting performance on RefCOCO, RefCOCO+, and RefCOCOg.

Abstract

Semi-supervised referring expression segmentation (SS-RES) aims to achieve precise pixel-level language grounding under limited annotation, yet suffers from limited supervision and unreliable pseudo-labels when exploiting unlabeled image-text pairs. In this work, we propose Learning to Label, a reinforced self-evolving framework (L2L) that casts pseudo-label construction as a learnable decision-making process. To build foundational understanding, we leverage a multimodal large language model to extract semantic-spatial priors, which are instantiated as initial soft segmentation proposals and elevated, together with textual cues, into learnable guidance signals that condition a hierarchical segmentation network. To ensure stable learning, reinforced pseudo-label selection is formulated as an exploratory decision process that adaptively rewards high-utility pixel-level supervision based on multimodal priors and model predictions. This reinforced self-evolving loop enables joint optimization of the segmentation model and pseudo-labels, progressively enhancing label reliability under sparse supervision. Extensive experiments on RefCOCO, RefCOCO+, and RefCOCOg demonstrate improvements over existing methods, validating its effectiveness and generalization.

Computer Vision Multimodal Models Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Learning to Label: A Reinforced Self-Evolving Framework for Semi-supervised Referring Expression Segmentation

Related Papers