HITHKUNational Clinical Research Center for HematologicPKUApr 2, 2026arXiv:2604.01742

Dense Point-to-Mask Optimization with Reinforced Point Selection for Crowd Instance Segmentation

Hongru Chen, Hongru Chen, Jiyang Huang, Jiyang Huang, Jia Wan, Antoni B.Chan, Antoni B. Chan

AI Summary

This paper introduces Dense Point-to-Mask Optimization (DPMO), which leverages SAM with a Nearest Neighbor Exclusive Circle constraint to generate instance segmentation masks from point annotations in crowded scenes. They then use these generated masks to train a Reinforced Point Selection (RPS) framework with Group Relative Policy Optimization (GRPO) for predicting instance segmentation. Experiments on four crowd datasets demonstrate state-of-the-art segmentation performance and show that mask annotations significantly improve counting accuracy.

Key Contribution

Turns out, you can get SOTA crowd instance segmentation by cleverly combining SAM with point supervision and reinforcement learning to select optimal points for mask generation.

Abstract

Crowd instance segmentation is a crucial task with a wide range of applications, including surveillance and transportation. Currently, point labels are common in crowd datasets, while region labels (e.g., boxes) are rare and inaccurate. The masks obtained through segmentation help to improve the accuracy of region labels and resolve the correspondence between individual location coordinates and crowd density maps. However, directly applying currently popular large foundation models such as SAM does not yield ideal results in dense crowds. To this end, we first propose Dense Point-to-Mask Optimization (DPMO), which integrates SAM with the Nearest Neighbor Exclusive Circle (NNEC) constraint to generate dense instance segmentation from point annotations. With DPMO and manual correction, we obtain mask annotations from the existing point annotations for traditional crowd datasets. Then, to predict instance segmentation in dense crowds, we propose a Reinforced Point Selection (RPS) framework trained with Group Relative Policy Optimization (GRPO), which selects the best predicted point from a sampling of the initial point prediction. Through extensive experiments, we achieve state-of-the-art crowd instance segmentation performance on ShanghaiTech, UCF-QNRF, JHU-CROWD++, and NWPU-Crowd datasets. Furthermore, we design new loss functions supervised by masks that boost counting performance across different models, demonstrating the significant role of mask annotations in enhancing counting accuracy.

Citation Metrics

Citations0

Influential citations0

References48

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Dense Point-to-Mask Optimization with Reinforced Point Selection for Crowd Instance Segmentation

Related Papers