Search papers, labs, and topics across Lattice.
This paper introduces Anchor PCA, a novel approach to principal component analysis (PCA) that focuses on identifying shared directions of variation across multiple related domains, rather than pooling data which can lead to spurious components. By balancing overall explained variance with agreement between shared and domain-specific embeddings, Anchor PCA effectively recovers a maximal invariant subspace and improves performance on unseen domains. The method is validated using both simulated and real-world gas sensor data, demonstrating superior variance explanation compared to traditional pooling methods and alternative approaches.
Anchor PCA reveals that focusing on shared variation across domains can significantly enhance the robustness of unsupervised dimension reduction, outperforming traditional pooling methods.
Principal component analysis (PCA) is one of the most widely used unsupervised dimension reduction techniques. We study PCA for data from multiple related domains. Since principal components generally differ across domains, one way to obtain a shared low-rank embedding is to perform PCA on the pooled data. However, this approach can focus on spurious directions that exhibit high variation in only a few domains. To find a robust embedding that still explains most variance in unseen but similar domains, we propose instead to focus on shared directions of variation. To this end, we introduce Anchor PCA which trades off overall explained variance with agreement between the shared and domain-specific low-rank embeddings. Anchor PCA amounts to PCA on a modified target matrix and thus can be solved efficiently. Moreover, we show that Anchor PCA recovers a maximal invariant subspace and admits a minimax reconstruction interpretation under bounded domain-specific covariance inflations. On simulated and real-world gas sensor data with temporal drift, we demonstrate, respectively, that Anchor PCA recovers the maximally invariant subspace and yields embeddings that explain more variance on unseen domains than the pooling baseline and a worst-case alternative. Taken together, these findings establish Anchor PCA as a promising approach to robust unsupervised dimension reduction from multi-domain data.