Search papers, labs, and topics across Lattice.
This paper introduces RLFSeg, a novel text-based image segmentation framework that leverages Rectified Flow to directly map images to segmentation masks in the latent space of a pre-trained diffusion model. By circumventing the noise-denoising process inherent in diffusion models, RLFSeg achieves superior performance, particularly in zero-shot scenarios, compared to prior diffusion-based segmentation methods. The framework incorporates label refinement and an Adaptive One-Step Sampling strategy to further enhance accuracy with single-step inference.
Ditching diffusion's noise-denoising, RLFSeg uses Rectified Flow to directly predict segmentation masks from text prompts, unlocking zero-shot performance gains.
Text-based image segmentation aims to delineate object boundaries within an image from text prompts, offering higher flexibility and broader application scope compared to traditional fixed-category segmentation tasks. Recent studies have shown that diffusion models (e.g., Stable Diffusion) can provide rich multimodal semantic features, leading to studies of using diffusion models as feature extractors for segmentation tasks. Such methods, however, inherit the generative natures of diffusion models that are harmful to discriminative segmentation tasks. In response, we propose RLFSeg, a novel framework that leverages Rectified Flow to learn direct mapping from the image to the segmentation mask within the latent space. The model is thus freed from the noise-denoise process and the need to optimize the time step of diffusion models, resulting in substantially better performance than previous diffusion-based methods, especially on zero-shot scenarios. By introducing label refinement and an Adaptive One-Step Sampling strategy, the model achieves higher accuracy even on a single inference step. The framework redirects a pretrained generative model to the discriminative segmentation task with zero modification to model structure, thus reveals promising application potential and significant research value.