Search papers, labs, and topics across Lattice.
This paper investigates whether Constitutional AI, specifically Anthropic's Claude Sonnet, exhibits cultural biases by comparing its responses on World Values Survey items to data from 90 nations. The study finds that Claude's value profile aligns most closely with Northern European and Anglophone countries, often exceeding the range of surveyed populations. Furthermore, providing cultural context to users does not significantly alter Claude's substantive value positions, suggesting a culturally-biased "value floor" is encoded within the model.
Claude's Constitution doesn't create a neutral AI, but instead bakes in the values of Northern European and Anglophone cultures, creating a value floor that's hard to shift.
Constitutional AI (CAI) aligns language models with explicitly stated normative principles, offering a transparent alternative to implicit alignment through human feedback alone. However, because constitutions are authored by specific groups of people, the resulting models may reflect particular cultural perspectives. We investigate this question by evaluating Anthropic's Claude Sonnet on 55 World Values Survey items, selected for high cross-cultural variance across six value domains and administered as both direct survey questions and naturalistic advice-seeking scenarios. Comparing Claude's responses to country-level data from 90 nations, we find that Claude's value profile most closely resembles those of Northern European and Anglophone countries, but on a majority of items extends beyond the range of all surveyed populations. When users provide cultural context, Claude adjusts its rhetorical framing but not its substantive value positions, with effect sizes indistinguishable from zero across all twelve tested countries. An ablation removing the system prompt increases refusals but does not alter the values expressed when responses are given, and replication on a smaller model (Claude Haiku) confirms the same cultural profile across model sizes. These findings suggest that when a constitution is authored within the same cultural tradition that dominates the training data, constitutional alignment may codify existing cultural biases rather than correct them--producing a value floor that surface-level interventions cannot meaningfully shift. We discuss the compounding nature of this risk and the need for globally representative constitution-authoring processes.