ETHFeb 19, 2026arXiv:2602.17187

Anti-causal domain generalization: Leveraging unlabeled data

Sorawit Saengkyongam, Juan L. Gamella, Andrew C. Miller, Jonas Peters, Nicolai Meinshausen, Christina Heinze-Deml

AI Summary

This paper addresses domain generalization in anti-causal settings where the outcome causes the covariates, enabling the use of unlabeled data. They propose regularizing the model's sensitivity to environment perturbations affecting the covariates, which can be estimated without labels. The authors demonstrate worst-case optimality guarantees and empirical performance on physical system and physiological signal datasets using methods that penalize sensitivity to variations in the mean and covariance of covariates.

Key Contribution

Unlock domain generalization with unlabeled data by exploiting the structure of anti-causal relationships, where outcomes cause covariates.

Abstract

The problem of domain generalization concerns learning predictive models that are robust to distribution shifts when deployed in new, previously unseen environments. Existing methods typically require labeled data from multiple training environments, limiting their applicability when labeled data are scarce. In this work, we study domain generalization in an anti-causal setting, where the outcome causes the observed covariates. Under this structure, environment perturbations that affect the covariates do not propagate to the outcome, which motivates regularizing the model's sensitivity to these perturbations. Crucially, estimating these perturbation directions does not require labels, enabling us to leverage unlabeled data from multiple environments. We propose two methods that penalize the model's sensitivity to variations in the mean and covariance of the covariates across environments, respectively, and prove that these methods have worst-case optimality guarantees under certain classes of environments. Finally, we demonstrate the empirical performance of our approach on a controlled physical system and a physiological signal dataset.

Data Curation & Synthetic Data Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Anti-causal domain generalization: Leveraging unlabeled data

Related Papers