Search papers, labs, and topics across Lattice.
This paper introduces a Divide-and-Conquer Holistic Cognition Network (DHCNet) to improve ultra-fine-grained visual categorization (Ultra-FGVC) performance in data-limited scenarios by focusing on holistic cues. DHCNet decomposes holistic cues into spatially-associated subtle discrepancies, using a self-shuffling operation on local regions to establish spatial associations and guide perception of the original topological structure. Experiments on five Ultra-FGVC datasets demonstrate that DHCNet achieves state-of-the-art performance by iteratively refining and incorporating these holistic cues as supervisory signals.
Decomposing holistic visual cues into subtle, spatially-associated discrepancies allows for state-of-the-art ultra-fine-grained classification even with limited training data.
Ultra-fine-grained visual categorization (Ultra-FGVC) aims to classify highly similar subcategories within fine-grained objects using limited training samples. However, holistic yet discriminative cues, such as leaf contours in extremely similar cultivars, remain under-explored in current studies, thereby limiting recognition performance. Though crucial, modeling holistic cues with complex morphological structures typically requires massive training samples, posing significant challenges in data-limited scenarios. To address this challenge, we propose a novel Divide-and-Conquer Holistic Cognition Network (DHCNet) that implements a divide-and-conquer strategy by decomposing holistic cues into spatially-associated subtle discrepancies and progressively establishing the holistic cognition process, significantly simplifying holistic cognition while reducing dependency on training data. Technically, DHCNet begins by progressively analyzing subtle discrepancies, transitioning from smaller local patches to larger ones using a self-shuffling operation on local regions. Simultaneously, it leverages the unaffected local regions to potentially guide the perception of the original topological structure among the shuffled patches, thereby aiding in the establishment of spatial associations for these discrepancies. Additionally, DHCNet incorporates the online refinement of these holistic cues discovered from local regions into the training process to iteratively improve their quality. As a result, DHCNet uses these holistic cues as supervisory signals to fine-tune the parameters of the recognition model, thus improving its sensitivity to holistic cues across the entire objects. Extensive evaluations demonstrate that DHCNet achieves remarkable performance on five widely-used Ultra-FGVC datasets.