Search papers, labs, and topics across Lattice.
This paper introduces an emotion-induction pipeline to study the influence of emotions on moral judgment in LLMs, finding that positive emotions generally increase moral acceptability while negative emotions decrease it. These induced emotions can reverse binary moral judgments in up to 20% of cases, with less capable models being more susceptible. A key finding is that specific emotions sometimes behave contrary to their valence, and that humans do not exhibit the same systematic shifts, highlighting an alignment gap.
LLMs' moral compasses are surprisingly swayed by their feelings: inject a little joy and suddenly previously unacceptable actions get a pass, revealing a critical divergence from human moral reasoning.
Large language models have been extensively studied for emotion recognition and moral reasoning as distinct capabilities, yet the extent to which emotions influence moral judgment remains underexplored. In this work, we develop an emotion-induction pipeline that infuses emotion into moral situations and evaluate shifts in moral acceptability across multiple datasets and LLMs. We observe a directional pattern: positive emotions increase moral acceptability and negative emotions decrease it, with effects strong enough to reverse binary moral judgments in up to 20% of cases, and with susceptibility scaling inversely with model capability. Our analysis further reveals that specific emotions can sometimes behave contrary to what their valence would predict (e.g., remorse paradoxically increases acceptability). A complementary human annotation study shows humans do not exhibit these systematic shifts, indicating an alignment gap in current LLMs.