Search papers, labs, and topics across Lattice.
This paper tackles domain adaptation under latent confounder shift, where proxies for the latent confounders are imperfect, violating the completeness assumption required by existing methods. They introduce Latent Equivalent Classes (LECs) to group latent confounders that induce the same proxy distribution and show that point-identification of a robust predictor is still possible if domains exhibit sufficient diversity in how they mix proxy-induced LECs. They propose Proximal Quasi-Bayesian Active learning (PQAL) to actively query diverse domains satisfying a cross-domain rank condition, demonstrating improved robustness and performance on synthetic and semi-synthetic data.
Even with imperfect proxies for latent confounders, robust predictors can still be uniquely identified across domains, provided those domains are sufficiently diverse in their latent structure.
Addressing the domain adaptation problem becomes more challenging when distribution shifts across domains stem from latent confounders that affect both covariates and outcomes. Existing proxy-based approaches that address latent shift rely on a strong completeness assumption to uniquely determine (point-identify) a robust predictor. Completeness requires that proxies have sufficient information about variations in latent confounders. For imperfect proxies the mapping from confounders to the space of proxy distributions is non-injective, and multiple latent confounder values can generate the same proxy distribution. This breaks the completeness assumption and observed data are consistent with multiple potential predictors (set-identified). To address this, we introduce latent equivalent classes (LECs). LECs are defined as groups of latent confounders that induce the same conditional proxy distribution. We show that point-identification for the robust predictor remains achievable as long as multiple domains differ sufficiently in how they mix proxy-induced LECs to form the robust predictor. This domain diversity condition is formalized as a cross-domain rank condition on the mixture weights, which is substantially weaker assumption than completeness. We introduce the Proximal Quasi-Bayesian Active learning (PQAL) framework, which actively queries a minimal set of diverse domains that satisfy this rank condition. PQAL can efficiently recover the point-identified predictor, demonstrates robustness to varying degrees of shift and outperforms previous methods on synthetic data and semi-synthetic dSprites dataset.