KAISTApr 23, 2026arXiv:2604.21871

Machine Behavior in Relational Moral Dilemmas: Moral Rightness, Predicted Human Behavior, and Model Decisions

Jiseon Kim, Jea Kwon, L. Vecchietti, Wenchao Dong, Jaehong Kim, Meeyoung Cha

AI Summary

This paper investigates how LLMs handle relational moral dilemmas, specifically the Whistleblower's Dilemma, by varying crime severity and relational closeness. It compares moral rightness, predicted human behavior, and actual model decisions, finding a divergence between the three. LLMs align with fairness-oriented moral rightness, even though their internal world-modeling predicts humans would prioritize loyalty in close relationships, revealing a potential misalignment in real-world applications.

Key Contribution

LLMs may fail in real-world moral decisions because they rigidly adhere to fairness norms, even when their own internal models predict humans would prioritize loyalty.

Abstract

Human moral judgment is context-dependent and modulated by interpersonal relationships. As large language models (LLMs) increasingly function as decision-support systems, determining whether they encode these social nuances is critical. We characterize machine behavior using the Whistleblower's Dilemma by varying two experimental dimensions: crime severity and relational closeness. Our study evaluates three distinct perspectives: (1) moral rightness (prescriptive norms), (2) predicted human behavior (descriptive social expectations), and (3) autonomous model decision-making. By analyzing the reasoning processes, we identify a clear cross-perspective divergence: while moral rightness remains consistently fairness-oriented, predicted human behavior shifts significantly toward loyalty as relational closeness increases. Crucially, model decisions align with moral rightness judgments rather than their own behavioral predictions. This inconsistency suggests that LLM decision-making prioritizes rigid, prescriptive rules over the social sensitivity present in their internal world-modeling, which poses a gap that may lead to significant misalignments in real-world deployments.

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References31

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Machine Behavior in Relational Moral Dilemmas: Moral Rightness, Predicted Human Behavior, and Model Decisions

Related Papers