Search papers, labs, and topics across Lattice.
This paper introduces Hard-Set-Guided Feature-Space Meta-Learning (HSFM), a bilevel meta-learning approach that improves classifier robustness to spurious correlations by performing augmentation directly in the feature space of a pre-trained backbone. HSFM learns to edit support-set features such that a classifier trained on these edited features achieves lower loss on hard examples and improved worst-group performance. Experiments on benchmark datasets demonstrate that HSFM achieves state-of-the-art performance in handling spurious correlations, while also showing through CLIP-based visualizations that the learned feature-space updates induce semantically meaningful shifts.
Retraining just the classifier head of a frozen feature extractor can be dramatically improved by meta-learning feature-space augmentations that target hard examples, leading to state-of-the-art robustness against spurious correlations.
Deep neural networks often rely on spurious features to make predictions, which makes them brittle under distribution shift and on samples where the spurious correlation does not hold (e.g., minority-group examples). Recent studies have shown that, even in such settings, the feature extractor of an Empirical Risk Minimization (ERM)-trained model can learn rich and informative representations, and that much of the failure may be attributed to the classifier head. In particular, retraining a lightweight head while keeping the backbone frozen can substantially improve performance on shifted distributions and minority groups. Motivated by this observation, we propose a bilevel meta-learning method that performs augmentation directly in feature space to improve spurious correlation handling in the classifier head. Our method learns support-side feature edits such that, after a small number of inner-loop updates on the edited features, the classifier achieves lower loss on hard examples and improved worst-group performance. By operating at the backbone output rather than in pixel space or through end-to-end optimization, the method is highly efficient and stable, requiring only a few minutes of training on a single GPU. We further validate our method with CLIP-based visualizations, showing that the learned feature-space updates induce semantically meaningful shifts aligned with spurious attributes.