Search papers, labs, and topics across Lattice.
The paper introduces Domain-invariant Context Optimization (DiCoOp), an extension of CoOp, to improve the domain generalization capabilities of vision-language models like CLIP. DiCoOp uses adversarial training to learn prompts that are invariant to domain shifts while maintaining class discriminability. Experiments demonstrate that DiCoOp outperforms CoOp on domain generalization tasks across various visual domains.
Adversarial training unlocks domain-invariant prompts for CLIP, boosting zero-shot generalization beyond standard prompt tuning.
Large pre-trained vision-language models like CLIP have transformed computer vision by aligning images and text in a shared feature space, enabling robust zero-shot transfer via prompting. Soft-prompting, such as Context Optimization (CoOp), effectively adapts these models for downstream recognition tasks by learning a set of context vectors. However, CoOp lacks explicit mechanisms for handling domain shifts across unseen distributions. To address this, we propose Domain-invariant Context Optimization (DiCoOp), an extension of CoOp optimized for domain generalization. By employing an adversarial training approach, DiCoOp forces the model to learn domain-invariant prompts while preserving discriminative power for classification. Experimental results show that DiCoOp consistently surpasses CoOp in domain generalization tasks across diverse visual domains.