Feb 23, 2026arXiv:2602.19910

Multi-Modal Representation Learning via Semi-Supervised Rate Reduction for Generalized Category Discovery

Wei He, Xianghan Meng, Zhiyuan Huang, Xianbiao Qi, Rong Xiao, Chun-Guang Li

AI Summary

The paper addresses the Generalized Category Discovery (GCD) problem by proposing a Semi-Supervised Rate Reduction framework (SSR$^2$-GCD) for multi-modal representation learning that emphasizes intra-modality alignment. SSR$^2$-GCD learns cross-modality representations with desired structural properties by explicitly aligning intra-modality relationships, a crucial aspect often overlooked in existing GCD methods. The approach also integrates prompt candidates using Vision Language Models to enhance knowledge transfer, leading to state-of-the-art performance on generic and fine-grained benchmark datasets.

Key Contribution

By explicitly aligning intra-modality relationships, SSR$^2$-GCD unlocks more effective cross-modal representation learning for generalized category discovery.

Abstract

Generalized Category Discovery (GCD) aims to identify both known and unknown categories, with only partial labels given for the known categories, posing a challenging open-set recognition problem. State-of-the-art approaches for GCD task are usually built on multi-modality representation learning, which is heavily dependent upon inter-modality alignment. However, few of them cast a proper intra-modality alignment to generate a desired underlying structure of representation distributions. In this paper, we propose a novel and effective multi-modal representation learning framework for GCD via Semi-Supervised Rate Reduction, called SSR$^2$-GCD, to learn cross-modality representations with desired structural properties based on emphasizing to properly align intra-modality relationships. Moreover, to boost knowledge transfer, we integrate prompt candidates by leveraging the inter-modal alignment offered by Vision Language Models. We conduct extensive experiments on generic and fine-grained benchmark datasets demonstrating superior performance of our approach.

Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Multi-Modal Representation Learning via Semi-Supervised Rate Reduction for Generalized Category Discovery

Related Papers