Search papers, labs, and topics across Lattice.
This paper introduces RT-Counter, a real-time framework for text-guided open-vocabulary object counting that overcomes limitations in fine-grained spatial understanding and inference speed. By implementing a Visual Prototype Textualization (VPT) module and Weaving Transformer (Weaformer) layers, RT-Counter achieves a remarkable balance between counting accuracy and computational efficiency. Experimental results demonstrate that RT-Counter operates at 112.48 FPS with a mean absolute error (MAE) of 13.30 on the FSC147 dataset, outperforming existing methods in both speed and parameter efficiency.
RT-Counter achieves 7.4x faster performance in object counting while maintaining competitive accuracy, breaking the traditional accuracy-speed trade-off.
Text-guided open-vocabulary object counting (TOOC) aims to count objects belonging to the categories specified by natural language descriptions. Although vision-language pre-trained models have been successful applied to TOOC tasks, they still struggle with fine-grained spatial understanding and real-time inference requirements in counting scenarios. To address these limitations, this paper proposes a real-time TOOC framework, called the Real-Time Counter (RT-Counter), that achieves not only good counting accuracy but also high computational efficiency. RT-Counter designs a novel Visual Prototype Textualization (VPT) module that can project learned visual features into a text feature space and then generate features containing the abstract information that is hard to capture with visual prototypes and the detailed prototype information that is difficult to describe in text, enhancing the object-level visual-language model's counting capabilities. Additionally, RT-Counter incorporates our Weaving Transformer (Weaformer) layers, maintaining high descriptive power at a fraction of the computational cost. The Weaformer layer adopts a novel hybrid attention mechanism that can efficiently weave together local and global visual features. Extensive experiments on three public datasets show that RT-Counter successfully breaks the accuracy-speed trade-off in TOOC. While achieving a competitive MAE of 13.30 on FSC147, RT-Counter operates at 112.48 FPS, making it 7.4x faster and over 4$\times$ more parameter-efficient than the existing leading methods in TOOC. Our work aims at balancing high accuracy and real-time performance in TOOC. Code is available at: https://github.com/Jason-Mar1/RT-Counter.