IDEAStony BrookFeb 23, 2026arXiv:2602.19432

CountEx: Fine-Grained Counting via Exemplars and Exclusion

Yifeng Huang, Gia Khanh Nguyen, Minh Hoai

AI Summary

The paper introduces CountEx, a discriminative visual counting framework that allows users to specify both inclusion and exclusion criteria via multimodal prompts (language and visual exemplars) to improve counting accuracy in cluttered scenes. CountEx employs a novel Discriminative Query Refinement module that reasons over inclusion and exclusion cues by identifying shared visual features, isolating exclusion-specific patterns, and applying selective suppression to refine the counting query. Experiments on the newly introduced CoCount benchmark, comprising 1,780 videos and 10,086 annotated frames across 97 category pairs, demonstrate that CountEx significantly outperforms existing prompt-based counting methods.

Key Contribution

Stop overcounting visually similar objects: CountEx lets you specify what to ignore, not just what to count, leading to more accurate visual counting.

Abstract

This paper presents CountEx, a discriminative visual counting framework designed to address a key limitation of existing prompt-based methods: the inability to explicitly exclude visually similar distractors. While current approaches allow users to specify what to count via inclusion prompts, they often struggle in cluttered scenes with confusable object categories, leading to ambiguity and overcounting. CountEx enables users to express both inclusion and exclusion intent, specifying what to count and what to ignore, through multimodal prompts including natural language descriptions and optional visual exemplars. At the core of CountEx is a novel Discriminative Query Refinement module, which jointly reasons over inclusion and exclusion cues by first identifying shared visual features, then isolating exclusion-specific patterns, and finally applying selective suppression to refine the counting query. To support systematic evaluation of fine-grained counting methods, we introduce CoCount, a benchmark comprising 1,780 videos and 10,086 annotated frames across 97 category pairs. Experiments show that CountEx achieves substantial improvements over state-of-the-art methods for counting objects from both known and novel categories. The data and code are available at https://github.com/bbvisual/CountEx.

Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

CountEx: Fine-Grained Counting via Exemplars and Exclusion

Related Papers