Search papers, labs, and topics across Lattice.
The paper introduces CoMoral, a new benchmark dataset designed to evaluate LLMs' ability to balance moral reasoning and commonsense understanding by embedding commonsense contradictions within moral dilemmas. Experiments on ten LLMs reveal a systematic failure to identify these contradictions, indicating a preference for moral consistency over logical reasoning. The authors also identify a "narrative focus bias," where LLMs are more likely to detect contradictions when attributed to secondary characters.
LLMs often choose moral consistency over basic common sense, especially when the contradiction is committed by the main character in a narrative.
Large Language Models (LLMs) are increasingly deployed across diverse real-world applications and user communities. As such, it is crucial that these models remain both morally grounded and knowledge-aware. In this work, we uncover a critical limitation of current LLMs -- their tendency to prioritize moral reasoning over commonsense understanding. To investigate this phenomenon, we introduce CoMoral, a novel benchmark dataset containing commonsense contradictions embedded within moral dilemmas. Through extensive evaluation of ten LLMs across different model sizes, we find that existing models consistently struggle to identify such contradictions without prior signal. Furthermore, we observe a pervasive narrative focus bias, wherein LLMs more readily detect commonsense contradictions when they are attributed to a secondary character rather than the primary (narrator) character. Our comprehensive analysis underscores the need for enhanced reasoning-aware training to improve the commonsense robustness of large language models.