Mar 5, 2026arXiv:2603.05053

CLIP-driven Zero-shot Learning with Ambiguous Labels

Jinfu Fan, Jiangnan Li, Xiaowen Yan, Xiaohui Zhong, Wenpeng Lu, Linqing Huang

AI Summary

This paper introduces CLIP-PZSL, a novel zero-shot learning framework designed to handle ambiguous labels by leveraging CLIP to extract instance and label features. A semantic mining block fuses these features to extract discriminative label embeddings, while a partial zero-shot loss function assigns weights to candidate labels based on relevance. The framework progressively identifies ground-truth labels during training, refining label embeddings and improving the semantic alignment of instance and label features, leading to improved ZSL performance.

Key Contribution

Zero-shot learning can now handle noisy, ambiguous labels thanks to a CLIP-driven framework that progressively refines labels during training.

Abstract

Zero-shot learning (ZSL) aims to recognize unseen classes by leveraging semantic information from seen classes, but most existing methods assume accurate class labels for training instances. However, in real-world scenarios, noise and ambiguous labels can significantly reduce the performance of ZSL. To address this, we propose a new CLIP-driven partial label zero-shot learning (CLIP-PZSL) framework to handle label ambiguity. First, we use CLIP to extract instance and label features. Then, a semantic mining block fuses these features to extract discriminative label embeddings. We also introduce a partial zero-shot loss, which assigns weights to candidate labels based on their relevance to the instance and aligns instance and label embeddings to minimize semantic mismatch. As the training goes on, the ground-truth labels are progressively identified, and the refined labels and label embeddings in turn help improve the semantic alignment of instance and label features. Comprehensive experiments on several datasets demonstrate the advantage of CLIP-PZSL.

Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References24

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

CLIP-driven Zero-shot Learning with Ambiguous Labels

Related Papers