Search papers, labs, and topics across Lattice.
This paper introduces CLIP-PZSL, a novel zero-shot learning framework designed to handle ambiguous labels by leveraging CLIP to extract instance and label features. A semantic mining block fuses these features to extract discriminative label embeddings, while a partial zero-shot loss function assigns weights to candidate labels based on relevance. The framework progressively identifies ground-truth labels during training, refining label embeddings and improving the semantic alignment of instance and label features, leading to improved ZSL performance.
Zero-shot learning can now handle noisy, ambiguous labels thanks to a CLIP-driven framework that progressively refines labels during training.
Zero-shot learning (ZSL) aims to recognize unseen classes by leveraging semantic information from seen classes, but most existing methods assume accurate class labels for training instances. However, in real-world scenarios, noise and ambiguous labels can significantly reduce the performance of ZSL. To address this, we propose a new CLIP-driven partial label zero-shot learning (CLIP-PZSL) framework to handle label ambiguity. First, we use CLIP to extract instance and label features. Then, a semantic mining block fuses these features to extract discriminative label embeddings. We also introduce a partial zero-shot loss, which assigns weights to candidate labels based on their relevance to the instance and aligns instance and label embeddings to minimize semantic mismatch. As the training goes on, the ground-truth labels are progressively identified, and the refined labels and label embeddings in turn help improve the semantic alignment of instance and label features. Comprehensive experiments on several datasets demonstrate the advantage of CLIP-PZSL.