Google Research

×Constitutional AI & AI Ethics

8 papers from Google Research on Constitutional AI & AI Ethics

Apr 21, 2026

Google ResearchApr 21, 2026·also Bar-Ilan, Cambridge

Location Not Found: Exposing Implicit Local and Global Biases in Multilingual LLMs

Multilingual LLMs exhibit a surprising "American bias," even when prompted in other languages, and instruction tuning makes it worse.

Guy Mor-Lan, Omer Goldman, Matan Eyal +5

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Apr 21, 2026·also Google Research

An AI Agent Execution Environment to Safeguard User Data

GAAP offers a deterministic, trust-minimized approach to AI agent security, safeguarding user data even when models are compromised or prompts are injected.

Robert Stanley, Avirishu Verma, Avi Verma +4

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness Tool Use & Agents

Apr 13, 2026

Ludwig-Maximilians-Universität MünchenApr 13, 2026·also DeepMind, Google Research, Stanford HAI, Munich Center for Machine Learning +1

Epistemic Trust as a Mechanism for Ethics Integration: Failure Modes and Design Principles from 70 Moral Imagination Workshops

Ethics interventions in AI development often fail because practitioners don't trust them – here's a breakdown of why, and how to fix it.

Benjamin Lange, Geoff Keeling, Kyle Pedersen +4

Constitutional AI & AI Ethics Natural Language Processing Tool Use & Agents

Mar 30, 2026

Google ResearchMar 30, 2026·also Institute of Philosophy, Joint last authors., Northwestern, SFI +1

Theory of Mind and Self-Attributions of Mentality are Dissociable in LLMs

Safety fine-tuning might inadvertently be stripping LLMs of their ability to understand non-human minds and entertain spiritual beliefs, even while preserving Theory of Mind.

Junsol Kim, Winnie Street, R. Rocca +4

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Google ResearchMar 30, 2026

Uncovering Relationships between Android Developers, User Privacy, and Developer Willingness to Reduce Fingerprinting Risks

Despite the effort required, Android developers overwhelmingly support platform-level changes to combat fingerprinting, suggesting a path to enhanced user privacy through collaborative platform-developer initiatives.

Alex Berke, Alex Berke, Güliz Seray Tuncay +4

Constitutional AI & AI Ethics Natural Language Processing

Mar 3, 2026

DeepMindMar 3, 2026·also Google Research

Architecting Trust in Artificial Epistemic Agents

LLMs are becoming "epistemic agents" that shape our knowledge environment, so we need a new framework for evaluating and governing them based on trustworthiness, not just performance.

Nahema Marchal, Stephanie Chan, Matija Franklin +4

Constitutional AI & AI Ethics Scalable Oversight & Alignment Theory Tool Use & Agents

Mar 1, 2026

Google ResearchMar 1, 2026

A Unified Framework to Quantify Cultural Intelligence of AI

Finally, a framework to quantify AI's cultural intelligence, moving beyond ad-hoc cultural benchmarks to a systematic, extensible, and theoretically grounded approach.

Sunipa Dev, Vinodkumar Prabhakaran, Rutledge Chin Feman +16

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Jul 10, 2025

DeepMindJul 10, 2025·also Google Research

DPO Unchained: Your Training Algorithm is Secretly Disentangled in Human Choice Theory

DPO's success isn't just clever engineering—it's deeply rooted in human choice theory, unlocking a surprisingly flexible framework for preference optimization and justifying many DPO extensions.

Wenxuan Zhou, Shujian Zhang, B. Magdalou +4

Constitutional AI & AI Ethics Interpretability & Mechanistic Interp RLHF & Preference Learning

Search

Google Research