Google DeepMind

×Constitutional AI & AI Ethics

5 papers from Google DeepMind on Constitutional AI & AI Ethics

Apr 13, 2026

Ludwig-Maximilians-Universität MünchenApr 13, 2026·also DeepMind, Google Research, Stanford HAI, Munich Center for Machine Learning +1

Epistemic Trust as a Mechanism for Ethics Integration: Failure Modes and Design Principles from 70 Moral Imagination Workshops

Ethics interventions in AI development often fail because practitioners don't trust them – here's a breakdown of why, and how to fix it.

Benjamin Lange, Geoff Keeling, Kyle Pedersen +4

Constitutional AI & AI Ethics Natural Language Processing Tool Use & Agents

Apr 8, 2026

DeepMindApr 8, 2026·also Google Research, MIT CSAIL, Stanford HAI

Google, AI Literacy, and the Learning Sciences: Multiple Modes of Research, Industry, and Practice Partnerships

Unpacking Google's AI literacy partnerships reveals the surprising complexities of aligning research, industry, and public needs.

Victor R. Lee, Michael Madaio, Ben Garside +19

Constitutional AI & AI Ethics Natural Language Processing

Mar 10, 2026

Google ResearchMar 10, 2026·also CMU ML, DeepMind, Harvard

Think Before You Lie: How Reasoning Improves Honesty

LLMs get *more* honest when they have time to reason, defying human tendencies and revealing surprising insights about their internal representational geometry.

Ann Yuan, Asma Ghandeharioun, Carter Blum +5

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought

Mar 3, 2026

DeepMindMar 3, 2026·also Google Research

Architecting Trust in Artificial Epistemic Agents

LLMs are becoming "epistemic agents" that shape our knowledge environment, so we need a new framework for evaluating and governing them based on trustworthiness, not just performance.

Nahema Marchal, Stephanie Chan, Matija Franklin +4

Constitutional AI & AI Ethics Scalable Oversight & Alignment Theory Tool Use & Agents

Jul 10, 2025

DeepMindJul 10, 2025·also Google Research

DPO Unchained: Your Training Algorithm is Secretly Disentangled in Human Choice Theory

DPO's success isn't just clever engineering—it's deeply rooted in human choice theory, unlocking a surprisingly flexible framework for preference optimization and justifying many DPO extensions.

Wenxuan Zhou, Shujian Zhang, B. Magdalou +4

Constitutional AI & AI Ethics Interpretability & Mechanistic Interp RLHF & Preference Learning

Search

Google DeepMind