SYSUUVAFeb 17, 2026arXiv:2602.15330

A Scalable Curiosity-Driven Game-Theoretic Framework for Long-Tail Multi-Label Learning in Data Mining

AI Summary

The paper introduces Curiosity-Driven Game-Theoretic Multi-Label Learning (CD-GTMLL) to address the long-tail distribution problem in large-scale multi-label classification by framing it as a cooperative game where sub-predictors specialize in label space partitions. CD-GTMLL incorporates intrinsic curiosity rewards based on tail label rarity and inter-player disagreement to adaptively inject learning signals into under-represented tail labels without manual tuning. Empirical results on seven benchmarks, including datasets with over 30,000 labels, demonstrate that CD-GTMLL outperforms state-of-the-art methods and improves Rare-F1 metric, with gains up to +1.6% P@3 on Wiki10-31K.

Key Contribution

Forget brittle hyperparameter tuning: a game-theoretic approach adaptively boosts performance on rare labels in extreme multi-label classification by rewarding curiosity about tail labels.

Abstract

The long-tail distribution, where a few head labels dominate while rare tail labels abound, poses a persistent challenge for large-scale Multi-Label Classification (MLC) in real-world data mining applications. Existing resampling and reweighting strategies often disrupt inter-label dependencies or require brittle hyperparameter tuning, especially as the label space expands to tens of thousands of labels. To address this issue, we propose Curiosity-Driven Game-Theoretic Multi-Label Learning (CD-GTMLL), a scalable cooperative framework that recasts long-tail MLC as a multi-player game - each sub-predictor ("player") specializes in a partition of the label space, collaborating to maximize global accuracy while pursuing intrinsic curiosity rewards based on tail label rarity and inter-player disagreement. This mechanism adaptively injects learning signals into under-represented tail labels without manual balancing or tuning. We further provide a theoretical analysis showing that our CD-GTMLL converges to a tail-aware equilibrium and formally links the optimization dynamics to improvements in the Rare-F1 metric. Extensive experiments across 7 benchmarks, including extreme multi-label classification datasets with 30,000+ labels, demonstrate that CD-GTMLL consistently surpasses state-of-the-art methods, with gains up to +1.6% P@3 on Wiki10-31K. Ablation studies further confirm the contributions of both game-theoretic cooperation and curiosity-driven exploration to robust tail performance. By integrating game theory with curiosity mechanisms, CD-GTMLL not only enhances model efficiency in resource-constrained environments but also paves the way for more adaptive learning in imbalanced data scenarios across industries like e-commerce and healthcare.

Data Curation & Synthetic Data Recommendation & Information Retrieval Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

A Scalable Curiosity-Driven Game-Theoretic Framework for Long-Tail Multi-Label Learning in Data Mining

Related Papers