Feb 16, 2026arXiv:2602.14835

The Global Representativeness Index: A Total Variation Distance Framework for Measuring Demographic Fidelity in Survey Research

AI Summary

This paper introduces the Global Representativeness Index (GRI), a metric based on Total Variation Distance, to quantify the demographic fidelity of survey samples against population benchmarks. The GRI addresses the limitations of response rates and demographic quotas by providing a [0, 1] score reflecting distributional similarity across multiple demographic dimensions. Validation across several global surveys, including Global Dialogues, World Values Survey, Afrobarometer, and Latinobarometro, reveals surprisingly low GRI scores, highlighting the challenge of achieving fine-grained global demographic representativeness even in large-scale surveys.

Key Contribution

Even large, widely-used global surveys often exhibit surprisingly poor demographic representativeness, scoring below 0.22 on a new Global Representativeness Index.

Abstract

Global survey research increasingly informs high-stakes decisions in AI governance and cross-cultural policy, yet no standardized metric quantifies how well a sample's demographic composition matches its target population. Response rates and demographic quotas -- the prevailing proxies for sample quality -- measure effort and coverage but not distributional fidelity. This paper introduces the Global Representativeness Index (GRI), a framework grounded in Total Variation Distance that scores any survey sample against population benchmarks across multiple demographic dimensions on a [0, 1] scale. Validation on seven waves of the Global Dialogues survey (N = 7,500 across 60+ countries) finds fine-grained demographic GRI scores of only 0.33--0.36 -- roughly 43% of the theoretical maximum at that sample size. Cross-validation on the World Values Survey (seven waves, N = 403,000), Afrobarometer Round 9 (N = 53,000), and Latinobarometro (N = 19,000) reveals that even large probability surveys score below 0.22 on fine-grained global demographics when country coverage is limited. The GRI connects to classical survey statistics through the design effect; both metrics are recommended as a minimum summary of sample quality, since GRI quantifies demographic distance symmetrically while effective N captures the asymmetric inferential cost of underrepresentation. The framework is released as an open-source Python library with UN and Pew Research Center population benchmarks, applicable to survey research, machine learning dataset auditing, and AI evaluation benchmarks.

Constitutional AI & AI Ethics Data Curation & Synthetic Data Eval Frameworks & Benchmarks

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

The Global Representativeness Index: A Total Variation Distance Framework for Measuring Demographic Fidelity in Survey Research

Related Papers