ETHJun 4, 2026arXiv:2606.06233

Anchor PCA

Benedikt Seiter, Anya Fries, Julius von Kügelgen, Jonas Peters

AI Summary

This paper introduces Anchor PCA, a novel approach to principal component analysis (PCA) that focuses on identifying shared directions of variation across multiple related domains, rather than pooling data which can lead to spurious components. By balancing overall explained variance with agreement between shared and domain-specific embeddings, Anchor PCA effectively recovers a maximal invariant subspace and improves performance on unseen domains. The method is validated using both simulated and real-world gas sensor data, demonstrating superior variance explanation compared to traditional pooling methods and alternative approaches.

Key Contribution

Anchor PCA reveals that focusing on shared variation across domains can significantly enhance the robustness of unsupervised dimension reduction, outperforming traditional pooling methods.

Abstract

Principal component analysis (PCA) is one of the most widely used unsupervised dimension reduction techniques. We study PCA for data from multiple related domains. Since principal components generally differ across domains, one way to obtain a shared low-rank embedding is to perform PCA on the pooled data. However, this approach can focus on spurious directions that exhibit high variation in only a few domains. To find a robust embedding that still explains most variance in unseen but similar domains, we propose instead to focus on shared directions of variation. To this end, we introduce Anchor PCA which trades off overall explained variance with agreement between the shared and domain-specific low-rank embeddings. Anchor PCA amounts to PCA on a modified target matrix and thus can be solved efficiently. Moreover, we show that Anchor PCA recovers a maximal invariant subspace and admits a minimax reconstruction interpretation under bounded domain-specific covariance inflations. On simulated and real-world gas sensor data with temporal drift, we demonstrate, respectively, that Anchor PCA recovers the maximally invariant subspace and yields embeddings that explain more variance on unseen domains than the pooling baseline and a worst-case alternative. Taken together, these findings establish Anchor PCA as a promising approach to robust unsupervised dimension reduction from multi-domain data.

Data Curation & Synthetic Data

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Anchor PCA

Related Papers