DAMOSYSUFeb 16, 2026arXiv:2602.15236

BindCLIP: A Unified Contrastive-Generative Representation Learning Framework for Virtual Screening

AI Summary

The paper introduces BindCLIP, a virtual screening framework that combines CLIP-style contrastive learning with a pocket-conditioned diffusion model for binding pose generation. By incorporating pose-level supervision, BindCLIP aims to create interaction-aware embeddings that are more sensitive to fine-grained binding interactions. The framework also uses hard-negative augmentation and a ligand-ligand anchoring regularizer to prevent representation collapse and shortcut reliance, leading to improved performance on virtual screening benchmarks, especially in out-of-distribution settings.

Key Contribution

By unifying contrastive learning with pose-conditioned generative modeling, BindCLIP produces interaction-aware embeddings that substantially improve virtual screening, especially in challenging out-of-distribution scenarios.

Abstract

Virtual screening aims to efficiently identify active ligands from massive chemical libraries for a given target pocket. Recent CLIP-style models such as DrugCLIP enable scalable virtual screening by embedding pockets and ligands into a shared space. However, our analyses indicate that such representations can be insensitive to fine-grained binding interactions and may rely on shortcut correlations in training data, limiting their ability to rank ligands by true binding compatibility. To address these issues, we propose BindCLIP, a unified contrastive-generative representation learning framework for virtual screening. BindCLIP jointly trains pocket and ligand encoders using CLIP-style contrastive learning together with a pocket-conditioned diffusion objective for binding pose generation, so that pose-level supervision directly shapes the retrieval embedding space toward interaction-relevant features. To further mitigate shortcut reliance, we introduce hard-negative augmentation and a ligand-ligand anchoring regularizer that prevents representation collapse. Experiments on two public benchmarks demonstrate consistent improvements over strong baselines. BindCLIP achieves substantial gains on challenging out-of-distribution virtual screening and improves ligand-analogue ranking on the FEP+ benchmark. Together, these results indicate that integrating generative, pose-level supervision with contrastive learning yields more interaction-aware embeddings and improves generalization in realistic screening settings, bringing virtual screening closer to real-world applicability.

Multimodal Models Recommendation & Information Retrieval Scientific Discovery & Drug Design

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

BindCLIP: A Unified Contrastive-Generative Representation Learning Framework for Virtual Screening

Related Papers