Mar 12, 2026arXiv:2603.11783

HELM: Hierarchical and Explicit Label Modeling with Graph Learning for Multi-Label Image Classification

Marjan Stoimchev, B. Koloski, Jurica Levati'c, Dragi Kocev, Savso Dvzeroski

AI Summary

HELM, a novel framework for hierarchical multi-label classification, addresses limitations of existing methods by using hierarchy-specific class tokens within a Vision Transformer to capture label interactions, employing graph convolutional networks to encode hierarchical structure, and integrating a self-supervised branch to leverage unlabeled imagery. Evaluated on four remote sensing image datasets, HELM achieves state-of-the-art performance in both supervised and semi-supervised settings, particularly excelling in low-label scenarios. This demonstrates the effectiveness of explicitly modeling label hierarchies and leveraging unlabeled data in HMLC.

Key Contribution

By explicitly modeling label hierarchies with graph learning and leveraging unlabeled data, HELM significantly boosts performance in hierarchical multi-label image classification, especially when labeled data is scarce.

Abstract

Hierarchical multi-label classification (HMLC) is essential for modeling complex label dependencies in remote sensing. Existing methods, however, struggle with multi-path hierarchies where instances belong to multiple branches, and they rarely exploit unlabeled data. We introduce HELM (\textit{Hierarchical and Explicit Label Modeling}), a novel framework that overcomes these limitations. HELM: (i) uses hierarchy-specific class tokens within a Vision Transformer to capture nuanced label interactions; (ii) employs graph convolutional networks to explicitly encode the hierarchical structure and generate hierarchy-aware embeddings; and (iii) integrates a self-supervised branch to effectively leverage unlabeled imagery. We perform a comprehensive evaluation on four remote sensing image (RSI) datasets (UCM, AID, DFC-15, MLRSNet). HELM achieves state-of-the-art performance, consistently outperforming strong baselines in both supervised and semi-supervised settings, demonstrating particular strength in low-label scenarios.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References24

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

HELM: Hierarchical and Explicit Label Modeling with Graph Learning for Multi-Label Image Classification

Related Papers