Search papers, labs, and topics across Lattice.
This paper critiques the use of aggregate accuracy as the primary metric for evaluating facial recognition systems in law enforcement, arguing it masks critical disparities in subgroup error rates. Through empirical analysis, the authors show that systems with comparable overall accuracy can exhibit significantly different fairness profiles when examining false positive and false negative rates across demographic groups. They advocate for fairness-aware evaluation approaches and model-agnostic auditing strategies to address the operational risks of accuracy-centric evaluation in high-stakes environments.
Aggregate accuracy can be dangerously misleading when evaluating facial recognition systems for law enforcement, obscuring significant disparities in error rates across demographic subgroups.
Facial recognition systems are increasingly deployed in law enforcement and security contexts, where algorithmic decisions can carry significant societal consequences. Despite high reported accuracy, growing evidence demonstrates that such systems often exhibit uneven performance across demographic groups, leading to disproportionate error rates and potential harm. This paper argues that aggregate accuracy is an insufficient metric for evaluating the fairness and reliability of facial recognition systems in high-stakes environments. Through analysis of subgroup-level error distribution, including false positive rate (FPR) and false negative rate (FNR), the paper demonstrates how aggregate performance metrics can obscure critical disparities across demographic groups. Empirical observations show that systems with similar overall accuracy can exhibit substantially different fairness profiles, with subgroup error rates varying significantly despite a single aggregate metric. The paper further examines the operational risks associated with accuracy-centric evaluation practices in law enforcement applications, where misclassification may result in wrongful suspicion or missed identification. It highlights the importance of fairness-aware evaluation approaches and model-agnostic auditing strategies that enable post-deployment assessment of real-world systems. The findings emphasise the need to move beyond accuracy as a primary metric and adopt more comprehensive evaluation frameworks for responsible AI deployment.