NVIDIAMay 23, 2025arXiv:2505.17663

Towards Dynamic Theory of Mind: Evaluating LLM Adaptation to Temporal Evolution of Human States

Yang Xiao, Jiashuo Wang, Qiancheng Xu, Changhe Song, Chunpu Xu, Yi Cheng, Wenjie Li, Pengfei Liu

AI Summary

The paper introduces DynToM, a new benchmark to evaluate Large Language Models' (LLMs) ability to track the temporal evolution of human mental states across interconnected social scenarios, addressing the limitations of existing static ToM benchmarks. They generated 1,100 social contexts comprising 5,500 scenarios and 78,100 questions, validated for realism and quality, using a four-step framework. Evaluation of ten state-of-the-art LLMs on DynToM revealed a significant performance gap compared to humans (44.7\% lower), particularly when tracking shifts in mental states, indicating limitations in modeling dynamic human cognition.

Key Contribution

LLMs are surprisingly bad at keeping up with how people's minds change over time, lagging humans by 45% on a new benchmark designed to test this crucial social skill.

Abstract

As Large Language Models (LLMs) increasingly participate in human-AI interactions, evaluating their Theory of Mind (ToM) capabilities - particularly their ability to track dynamic mental states - becomes crucial. While existing benchmarks assess basic ToM abilities, they predominantly focus on static snapshots of mental states, overlooking the temporal evolution that characterizes real-world social interactions. We present \textsc{DynToM}, a novel benchmark specifically designed to evaluate LLMs' ability to understand and track the temporal progression of mental states across interconnected scenarios. Through a systematic four-step framework, we generate 1,100 social contexts encompassing 5,500 scenarios and 78,100 questions, each validated for realism and quality. Our comprehensive evaluation of ten state-of-the-art LLMs reveals that their average performance underperforms humans by 44.7\%, with performance degrading significantly when tracking and reasoning about the shift of mental states. This performance gap highlights fundamental limitations in current LLMs' ability to model the dynamic nature of human mental states.

Eval Frameworks & Benchmarks Natural Language Processing Reasoning & Chain-of-Thought

Citation Metrics

Citations9

Influential citations1

References32

Year2025

VenueAnnual Meeting of the Association for Computational Linguistics

Related Papers

Finding related papers...

Search

Towards Dynamic Theory of Mind: Evaluating LLM Adaptation to Temporal Evolution of Human States

Related Papers