Mar 19, 2026arXiv:2603.19218

Rethinking Vector Field Learning for Generative Segmentation

Chaoyang Wang, Chaoyang Wang, Yaobo Liang, Yaobo Liang, Boci Peng, Boci Peng, Fan Duan, Fan Duan, Jingdong Wang, Yunhai Tong, Yunhai Tong

AI Summary

This paper analyzes the limitations of using standard continuous flow matching objectives for generative segmentation with diffusion models, identifying gradient vanishing and trajectory traversing as key issues. To address these, they introduce a vector field reshaping strategy that adds a distance-aware correction term to the learned velocity field, enhancing gradients near class centroids. They also propose a quasi-random category encoding scheme based on Kronecker sequences for improved pixel-level semantic alignment, achieving significant performance gains over standard flow matching.

Key Contribution

Diffusion models can generate segmentations that rival discriminative methods, but only if you reshape their vector fields with a distance-aware correction term that combats gradient vanishing.

Abstract

Taming diffusion models for generative segmentation has attracted increasing attention. While existing approaches primarily focus on architectural tweaks or training heuristics, there remains a limited understanding of the intrinsic mismatch between continuous flow matching objectives and discrete perception tasks. In this work, we revisit diffusion segmentation from the perspective of vector field learning. We identify two key limitations of the commonly used flow matching objective: gradient vanishing and trajectory traversing, which result in slow convergence and poor class separation. To tackle these issues, we propose a principled vector field reshaping strategy that augments the learned velocity field with a detached distance-aware correction term. This correction introduces both attractive and repulsive interactions, enhancing gradient magnitudes near centroids while preserving the original diffusion training framework. Furthermore, we design a computationally efficient, quasi-random category encoding scheme inspired by Kronecker sequences, which integrates seamlessly with an end-to-end pixel neural field framework for pixel-level semantic alignment. Extensive experiments consistently demonstrate significant improvements over vanilla flow matching approaches, substantially narrowing the performance gap between generative segmentation and strong discriminative specialists.

Architecture Design (Transformers, SSMs, MoE)Computer Vision

Citation Metrics

Citations0

Influential citations0

References50

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Rethinking Vector Field Learning for Generative Segmentation

Related Papers