Search papers, labs, and topics across Lattice.
HELM, a novel framework for hierarchical multi-label classification, addresses limitations of existing methods by using hierarchy-specific class tokens within a Vision Transformer to capture label interactions, employing graph convolutional networks to encode hierarchical structure, and integrating a self-supervised branch to leverage unlabeled imagery. Evaluated on four remote sensing image datasets, HELM achieves state-of-the-art performance in both supervised and semi-supervised settings, particularly excelling in low-label scenarios. This demonstrates the effectiveness of explicitly modeling label hierarchies and leveraging unlabeled data in HMLC.
By explicitly modeling label hierarchies with graph learning and leveraging unlabeled data, HELM significantly boosts performance in hierarchical multi-label image classification, especially when labeled data is scarce.
Hierarchical multi-label classification (HMLC) is essential for modeling complex label dependencies in remote sensing. Existing methods, however, struggle with multi-path hierarchies where instances belong to multiple branches, and they rarely exploit unlabeled data. We introduce HELM (\textit{Hierarchical and Explicit Label Modeling}), a novel framework that overcomes these limitations. HELM: (i) uses hierarchy-specific class tokens within a Vision Transformer to capture nuanced label interactions; (ii) employs graph convolutional networks to explicitly encode the hierarchical structure and generate hierarchy-aware embeddings; and (iii) integrates a self-supervised branch to effectively leverage unlabeled imagery. We perform a comprehensive evaluation on four remote sensing image (RSI) datasets (UCM, AID, DFC-15, MLRSNet). HELM achieves state-of-the-art performance, consistently outperforming strong baselines in both supervised and semi-supervised settings, demonstrating particular strength in low-label scenarios.