May 6, 2026arXiv:2605.05077

FlowDIS: Language-Guided Dichotomous Image Segmentation with Flow Matching

AI Summary

The paper introduces FlowDIS, a novel dichotomous image segmentation method that leverages flow matching to learn a time-dependent vector field mapping image distributions to corresponding mask distributions, optionally conditioned on text prompts. A Position-Aware Instance Pairing (PAIP) training strategy enhances controllability via text prompts, enabling precise, pixel-level object segmentation. Experiments demonstrate that FlowDIS significantly outperforms state-of-the-art approaches, achieving a 5.5% improvement in $F_β^ω$ and a 43% reduction in MAE on the DIS-TE test set compared to the best prior method.

Key Contribution

FlowDIS achieves state-of-the-art dichotomous image segmentation by using flow matching, even allowing for precise, pixel-level control via text prompts.

Abstract

Accurate image segmentation is essential for modern computer vision applications such as image editing, autonomous driving, and medical image analysis. In recent years, Dichotomous Image Segmentation (DIS) has become a standard task for training and evaluating highly accurate segmentation models. Existing DIS approaches often fail to preserve fine-grained details or fully capture the semantic structure of the foreground. To address these challenges, we present FlowDIS, a novel dichotomous image segmentation method built on the flow matching framework, which learns a time-dependent vector field to transport the image distribution to the corresponding mask distribution, optionally conditioned on a text prompt. Moreover, with our Position-Aware Instance Pairing (PAIP) training strategy, FlowDIS offers strong controllability through text prompts, enabling precise, pixel-level object segmentation. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches both with and without language guidance. Compared with the best prior DIS method, FlowDIS achieves a 5.5% higher $F_β^ω$ measure and 43% lower MAE ($\mathcal{M}$) on the DIS-TE test set. The code is available at: https://github.com/Picsart-AI-Research/FlowDIS

Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

FlowDIS: Language-Guided Dichotomous Image Segmentation with Flow Matching

Related Papers