Search papers, labs, and topics across Lattice.
This paper introduces an attribute-aware face recognition architecture that learns face embeddings by jointly considering identity labels, identity-relevant facial attributes, and non-identity-related attributes. The key idea is to decompose and analyze the individual contributions of different attribute groups to improve discriminability. Results on face verification benchmarks show that using identity-relevant attribute subsets outperforms broader attribute sets, and explicitly unlearning non-identity-related attributes further enhances performance.
Forget what you know: explicitly unlearning non-identity-related facial attributes significantly boosts face recognition accuracy, suggesting current models rely on spurious correlations.
Despite recent advances in face recognition, robust performance remains challenging under large variations in age, pose, and occlusion. A common strategy to address these issues is to guide representation learning with auxiliary supervision from facial attributes, encouraging the visual encoder to focus on identity-relevant regions. However, existing approaches typically rely on heterogeneous and fixed sets of attributes, implicitly assuming equal relevance across attributes. This assumption is suboptimal, as different attributes exhibit varying discriminative power for identity recognition, and some may even introduce harmful biases. In this paper, we propose an attribute-aware face recognition architecture that supervises the learning of facial embeddings using identity class labels, identity-relevant facial attributes, and non-identity-related attributes. Facial attributes are organized into interpretable groups, making it possible to decompose and analyze their individual contributions in a human-understandable manner. Experiments on standard face verification benchmarks demonstrate that joint learning of identity and facial attributes improves the discriminability of face embeddings with two major conclusions: (i) using identity-relevant subsets of facial attributes consistently outperforms supervision with a broader attribute set, and (ii) explicitly forcing embeddings to unlearn non-identity-related attributes yields further performance gains compared to leaving such attributes unsupervised. Additionally, our method serves as a diagnostic tool for assessing the trustworthiness of face recognition encoders by allowing for the measurement of accuracy gains with suppression of non-identity-relevant attributes, with such gains suggesting shortcut learning from redundant attributes associated with each identity.