Mar 15, 2026arXiv:2603.14222

Membership Inference for Contrastive Pre-training Models with Text-only PII Queries

Ruoxi Cheng, Yizhong Ding, Hongyi Zhang, Yiyan Huang

AI Summary

This paper introduces Unimodal Membership Inference Detector (UMID), a text-only auditing framework for contrastive pre-training models like CLIP and CLAP, addressing concerns about memorizing PII. UMID leverages text-guided cross-modal latent inversion to extract similarity and variability signals, comparing them against a non-member reference built from synthetic gibberish. Experiments show UMID significantly improves membership inference effectiveness and efficiency compared to prior methods, achieving strong detection performance with sub-second auditing cost.

Key Contribution

You can now audit CLIP and CLAP models for PII memorization using *only* text queries, sidestepping the need for risky biometric inputs and computationally expensive shadow models.

Abstract

Contrastive pretraining models such as CLIP and CLAP underpin many vision-language and audio-language systems, yet their reliance on web-scale data raises growing concerns about memorizing Personally Identifiable Information (PII). Auditing such models via membership inference is challenging in practice: shadow-model MIAs are computationally prohibitive for large multimodal backbones, and existing multimodal attacks typically require querying the target with paired biometric inputs, thereby directly exposing sensitive biometric information to the target model. We propose Unimodal Membership Inference Detector (UMID), a text-only auditing framework that performs text-guided cross-modal latent inversion and extracts two complementary signals, similarity (alignment to the queried text) and variability (consistency across randomized inversions). UMID compares these statistics to a lightweight non-member reference constructed from synthetic gibberish and makes decisions via an ensemble of unsupervised anomaly detectors. Comprehensive experiments across diverse CLIP and CLAP architectures demonstrate that UMID significantly improves the effectiveness and efficiency over prior MIAs, delivering strong detection performance with sub-second auditing cost while complying with realistic privacy constraints.

Constitutional AI & AI Ethics Multimodal Models Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Membership Inference for Contrastive Pre-training Models with Text-only PII Queries

Related Papers