May 28, 2026arXiv:2605.29720

Efficient, Validation-Free Intrinsic Quality Estimation for Large-Scale Face Recognition Datasets

Zhichao Chen, Yongle Zhao, Kaicheng Yang, Meng Yang, Yin Xie, Ziyong Feng

AI Summary

This paper introduces Intrinsic Quality (IQ), a validation-free metric for estimating the potential of face recognition datasets by combining a Neighbor-Consistency Score and Global Representation Subspace Complexity. IQ enables rapid dataset evaluation using proxy models, predicting downstream performance without full training. Experiments on clean, noisy, and mixed-quality datasets demonstrate IQ's effectiveness in dataset diagnosis and curation.

Key Contribution

Skip the costly full training runs: this new metric accurately predicts face recognition dataset quality using only lightweight proxy models.

Abstract

We propose Intrinsic Quality (IQ), a validation-free metric designed to estimate the inherent potential of face recognition (FR) datasets to produce high-performance models without the need for full-scale training. IQ integrates two components: (i) a Neighbor-Consistency Score that quantifies local identity label agreement via nearest neighbors, and (ii) Global Representation Subspace Complexity (Effective Rank, ER), which captures the underlying embedding geometry and dataset diversity. IQ allows for rapid evaluation using lightweight proxy models or data subsets, facilitating dataset diagnosis and curation prior to resource-intensive full-scale training. We describe an experimental protocol tailored to clean, noisy, and mixed-quality FR datasets, and outline evaluation methodologies to validate IQ's predictive power for downstream performance.

Computer Vision Data Curation & Synthetic Data Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References41

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Efficient, Validation-Free Intrinsic Quality Estimation for Large-Scale Face Recognition Datasets

Related Papers