Search papers, labs, and topics across Lattice.
This paper introduces Robust-TOOC, a benchmark designed to evaluate Text-guided Open-vocabulary Object Counting (TOOC) under various real-world degradation conditions, including rain, fog, and sensor noise. To enhance robustness without altering the original counting architecture, the authors propose Dual-TTT, a test-time training framework that optimizes a lightweight denoising module while keeping the counting network static. Experimental results show that Dual-TTT significantly improves counting accuracy in adverse conditions, highlighting its practical utility in real-world applications.
Real-world conditions can severely impair object counting accuracy, but a novel test-time training approach boosts performance without requiring architectural changes.
Text-guided Open-vocabulary Object Counting (TOOC) enables counting arbitrary object categories specified by text prompts, offering substantially greater flexibility than conventional closed-set counting. However, existing TOOC methods are developed and evaluated primarily on ideal images, while real-world scenes often suffer from adverse conditions such as rain, fog, darkness, and sensor noise, which severely degrade visual quality and impair vision-language alignment. To bridge this gap, we introduce Robust-TOOC, the first benchmark for evaluating TOOC under diverse corruption conditions, which covers six representative degradation types: rain, fog, darkness, Gaussian noise, salt-and-pepper noise, and mixed corruption. To improve robustness while preserving the original counting architecture, we propose Dual-TTT, a dual-architecture test-time training framework for TOOC. Specifically, during test-time training, Dual-TTT updates only the Text-guided Lightweight Denoising module (TL-Denoiser), while keeping the original counting network frozen. Inspired by diffusion models, the TL-Denoiser is optimized to remove corruption-aware noise from image representations under degraded conditions. Since only the TL-Denoiser is trained at test time, Dual-TTT is annotation-free and can be seamlessly integrated into existing TOOC models without modifying their original architecture. Extensive experiments on multiple recent TOOC baselines demonstrate the effectiveness of our method.