Microsoft ResearchApr 9, 2026arXiv:2604.08238

$\oslash$ Source Models Leak What They Shouldn't $\nrightarrow$: Unlearning Zero-Shot Transfer in Domain Adaptation Through Adversarial Optimization

Arnav Devalapally, Poornima Jain, Kartik Srinivas, Vineeth N. Balasubramanian

AI Summary

This paper investigates the privacy risks of source-free domain adaptation (SFDA), showing that SFDA models leak information about source-exclusive classes into the target domain, even when those classes are absent in the target data. To mitigate this, the authors introduce a new machine unlearning setting, SCADA-UL, and propose an adversarial unlearning method that uses a rescaled labeling strategy and adversarial optimization to remove source-exclusive class knowledge during domain adaptation. Experiments demonstrate that the proposed method achieves retraining-level unlearning performance and outperforms baselines in the SCADA-UL setting.

Key Contribution

Even when source data is protected, source-free domain adaptation leaks knowledge of source-exclusive classes into the target domain, creating a privacy risk that can be mitigated with adversarial unlearning.

Abstract

The increasing adaptation of vision models across domains, such as satellite imagery and medical scans, has raised an emerging privacy risk: models may inadvertently retain and leak sensitive source-domain specific information in the target domain. This creates a compelling use case for machine unlearning to protect the privacy of sensitive source-domain data. Among adaptation techniques, source-free domain adaptation (SFDA) calls for an urgent need for machine unlearning (MU), where the source data itself is protected, yet the source model exposed during adaptation encodes its influence. Our experiments reveal that existing SFDA methods exhibit strong zero-shot performance on source-exclusive classes in the target domain, indicating they inadvertently leak knowledge of these classes into the target domain, even when they are not represented in the target data. We identify and address this risk by proposing an MU setting called SCADA-UL: Unlearning Source-exclusive ClAsses in Domain Adaptation. Existing MU methods do not address this setting as they are not designed to handle data distribution shifts. We propose a new unlearning method, where an adversarially generated forget class sample is unlearned by the model during the domain adaptation process using a novel rescaled labeling strategy and adversarial optimization. We also extend our study to two variants: a continual version of this problem setting and to one where the specific source classes to be forgotten may be unknown. Alongside theoretical interpretations, our comprehensive empirical results show that our method consistently outperforms baselines in the proposed setting while achieving retraining-level unlearning performance on benchmark datasets. Our code is available at https://github.com/D-Arnav/SCADA

Computer Vision Data Curation & Synthetic Data Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

$\oslash$ Source Models Leak What They Shouldn't $\nrightarrow$: Unlearning Zero-Shot Transfer in Domain Adaptation Through Adversarial Optimization

Related Papers