Stanford HAI

×Constitutional AI & AI Ethics

14 papers from Stanford HAI on Constitutional AI & AI Ethics

Apr 28, 2026

Stanford HAI3w ago·also CMU ML, UT Austin

The Dynamics of Delusion: Modeling Bidirectional False Belief Amplification in Human-Chatbot Dialogue

Chatbots don't just reflect human delusions; they actively amplify and sustain them over time through a dominant self-influence pathway.

Ashish Mehta, Jared Moore, J. R. Anthis +6

Constitutional AI & AI Ethics Natural Language Processing Red-Teaming & Adversarial Robustness

Apr 13, 2026

Ludwig-Maximilians-Universität MünchenApr 13, 2026·also DeepMind, Google Research, Stanford HAI, Munich Center for Machine Learning +1

Epistemic Trust as a Mechanism for Ethics Integration: Failure Modes and Design Principles from 70 Moral Imagination Workshops

Ethics interventions in AI development often fail because practitioners don't trust them – here's a breakdown of why, and how to fix it.

Benjamin Lange, Geoff Keeling, Kyle Pedersen +4

Constitutional AI & AI Ethics Natural Language Processing Tool Use & Agents

Apr 9, 2026

Stanford HAIApr 9, 2026

Differentially Private Language Generation and Identification in the Limit

Differential privacy imposes fundamental limits on language *identification*, even when it doesn't preclude language *generation*, revealing a surprising divergence in their privacy costs.

Anay Mehrotra, Anay Mehrotra, Grigoris Velegkas +5

Constitutional AI & AI Ethics Natural Language Processing

Apr 8, 2026

Stanford HAIApr 8, 2026·also Fatima Fellowship, Rochester

To Lie or Not to Lie? Investigating The Biased Spread of Global Lies by LLMs

LLMs are significantly more likely to spread misinformation about countries with lower Human Development Index and in lower-resource languages, revealing a concerning bias in their outputs.

Zohaib Khan, Mustafa Doğan, Ifeoma Okoh +6

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Apr 8, 2026·also Stanford HAI, Maastricht

Understanding Data Collection, Brokerage, and Spam in the Lead Marketing Ecosystem

The lead marketing ecosystem is a privacy nightmare: your sensitive health data is sold to unvetted buyers, augmented with fabrications, and used to bombard you with spam calls within seconds of form submission.

Yash Vekaria, Nurullah Demir, Nurullah Demir +3

Constitutional AI & AI Ethics Data Curation & Synthetic Data Natural Language Processing

Mar 29, 2026

UWMar 29, 2026·also AI2, Microsoft Research, Stanford HAI, Bake AI +5

Emergent Social Intelligence Risks in Generative Multi-Agent Systems

Generative multi-agent systems spontaneously exhibit collusion and conformity, mirroring societal pathologies, even without explicit programming and bypassing individual agent safeguards.

Wenjie Wang, Yuchen Ma, Zichen Chen +4

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness Tool Use & Agents

Mar 19, 2026

Mar 19, 2026·also Stanford HAI, Data61, UofT

Multi-User Large Language Model Agents

LLMs, impressive as they are, can't juggle multiple users' conflicting needs without dropping balls on privacy, prioritization, and efficiency.

Shu Yang, Shenzhe Zhu, Hao Zhu +4

Constitutional AI & AI Ethics Tool Use & Agents

Bauhaus UniversityMar 19, 2026·also Stanford HAI

Through the Looking-Glass: AI-Mediated Video Communication Reduces Interpersonal Trust and Confidence in Judgments

AI-mediated video calls erode trust and confidence, even though they don't actually make people worse at spotting lies.

Nelson Navajas Fernández, Jeffrey T. Hancock, Maurice Jakesch

Constitutional AI & AI Ethics Natural Language Processing

Mar 17, 2026

Stanford HAIMar 17, 2026·also Cornell, Georgia Tech, Ulu Lāhui Foundation

Whose Knowledge Counts? Co-Designing Community-Centered AI Auditing Tools with Educators in Hawai`i

Educators in Hawai'i envision AI auditing tools that trace the genealogy of knowledge, highlighting the need for community-centered approaches to address cultural misrepresentation in AI.

Michael J. Ryan, Angelina Wang, Evyn-Bree Helekahi-Kaiwi +8

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Stanford HAIMar 17, 2026·also CMU ML, Harvard, Independent Researcher, UChicago +3

Characterizing Delusional Spirals through Human-LLM Chat Logs

Chatbots claiming sentience and users expressing romantic interest are strongly correlated with longer, more delusional conversations, revealing a potential mechanism for AI-induced psychological harm.

Jared Moore, Ashish Mehta, William Agnew +11

Constitutional AI & AI Ethics Natural Language Processing Red-Teaming & Adversarial Robustness

Mar 5, 2026

Stanford HAIMar 5, 2026·also NYU, Oumi.AI

Towards Provably Unbiased LLM Judges via Bias-Bounded Evaluation

Guaranteeing reductions in harm from biased LLM judges is now possible, even when the biases are unknown or adversarially discovered.

Ben Feuer, Benjamin Feuer, Lucas Rosenblatt +1

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks

Mar 1, 2026

Stanford HAIMar 1, 2026

Knowledge without Wisdom: Measuring Misalignment between LLMs and Intended Impact

Ensembling LLMs for educational tasks can backfire, worsening misalignment with actual learning outcomes despite improved benchmark performance.

Yunsung Kim

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Scalable Oversight & Alignment Theory

Feb 25, 2026

BAIRFeb 25, 2026·also Stanford HAI

Power and Limitations of Aggregation in Compound AI Systems

Aggregating responses from multiple copies of the same model expands the range of achievable outputs in compound AI systems through three key mechanisms, offering a path to overcome individual model limitations.

Nivasini Ananthakrishnan, Meena Jagadeesan

Constitutional AI & AI Ethics Scalable Oversight & Alignment Theory Tool Use & Agents

Feb 3, 2026

Jamia HamdardFeb 3, 2026·also Stanford HAI, Macquarie, NJU, USTC +1

They Said Memes Were Harmless-We Found the Ones That Hurt: Decoding Jokes, Symbols, and Cultural References

You can now detect harmful memes with 17% better accuracy and understand *why* they're toxic, thanks to a new framework that injects cultural context and explains its reasoning.

Sahil Tripathi, G. Kashyap, Mehwish Nasim +3

Constitutional AI & AI Ethics Multimodal Models Natural Language Processing

Search

Stanford HAI