Search papers, labs, and topics across Lattice.
This paper investigates the impact of lexical anthropomorphism (LA) on moral judgments of AI's bad behavior across four experiments (N=1020). Contrary to expectations, humanizing language and design cues had minimal influence on moral evaluations of AI misconduct. Instead, the type of moral violation committed by the AI was the primary driver of moral judgments, particularly for harm and degradation violations.
Turns out, calling an AI "he" or giving it a human-like avatar doesn't significantly change how harshly we judge its misdeeds; the severity of the AI's actions matters far more.
Anthropomorphic language describing artificial intelligence (AI) is widespread in media, policy, and everyday discourse; so too are discussions of AI bad behavior, from hallucinations to inappropriate comments. How does humanizing language about AI shape moral judgments when AI behaves badly? Across four experiments (total N = 1,020), we tested whether lexical anthropomorphism (LA) primes shape judgments of AI moral character, behavior morality, and behavioral responsibility. Studies 1-3 tested interactions between anthropomorphic language and humanizing design cues (icons, names, self-referencing) in the context of amoral errors. Study 4 extended this to genuinely immoral AI behavior across seven moral-violation types. Results indicate humanizing language and design cues have little influence on moral judgments of misbehaving AI. Where effects emerged, high-anthropomorphic primes elevated perceptions of an AI's capacity for dishonesty. The type of moral violation observed was the strongest predictor of moral judgments, with harm and degradation violations producing the broadest negative character assessments. Prime drift, horn effects, and egoistic value orientations emerged as potentially important predictors of AI moral judgments.