Search papers, labs, and topics across Lattice.
Shenzhen University, City University of Hong Kong
4
0
5
Transformer-based architectures can now outperform CNNs in multi-view crowd tracking, especially in large, complex real-world scenes, thanks to a novel view-ground interaction mechanism.
Turns out, you can get SOTA crowd instance segmentation by cleverly combining SAM with point supervision and reinforcement learning to select optimal points for mask generation.
By cleverly using readily available video segmentation masks, this method boosts DINOv2's point tracking performance by over 14% – a surprisingly effective way to inject temporal awareness into static image-pretrained models.
Achieve state-of-the-art semi-supervised crowd instance segmentation and counting by generating high-quality mask supervision from sparse annotations, effectively bridging the gap between these two tasks.