ANUSMUApr 22, 2026arXiv:2604.20443

DialToM: A Theory of Mind Benchmark for Forecasting State-Driven Dialogue Trajectories

Neemesh Yadav, Palakorn Achananuparp, Jing Jiang

AI Summary

This paper introduces DialToM, a benchmark designed to assess the Theory of Mind (ToM) capabilities of Large Language Models (LLMs) in forecasting state-driven dialogue trajectories. By evaluating both Literal and Functional ToM through a multiple-choice framework, the study reveals that while LLMs can accurately identify mental states, they struggle to utilize this understanding for predicting social interactions, with the exception of Gemini 3 Pro. The findings highlight a significant reasoning gap, as LLMs show weak semantic alignment with human inferences, raising questions about the robustness of their ToM abilities.

Key Contribution

LLMs can pinpoint mental states but falter at predicting dialogue trajectories, revealing a critical gap in their reasoning capabilities.

Abstract

Large Language Models (LLMs) have been shown to possess Theory of Mind (ToM) abilities. However, it remains unclear whether this stems from robust reasoning or spurious correlations. We introduce DialToM, a human-verified benchmark built from natural human dialogue using a multiple-choice framework. We evaluate not only mental state prediction (Literal ToM) but also the functional utility of these states (Functional ToM) through Prospective Diagnostic Forecasting -- probing whether models can identify state-consistent dialogue trajectories solely from mental-state profiles. Our results reveal a significant reasoning asymmetry: while LLMs excel at identifying mental states, most (except for Gemini 3 Pro) fail to leverage this understanding to forecast social trajectories. Additionally, we find only weak semantic similarities between human and LLM-generated inferences. To facilitate reproducibility, the DialToM dataset and evaluation code are publicly available at https://github.com/Stealth-py/DialToM.

Eval Frameworks & Benchmarks Natural Language Processing Reasoning & Chain-of-Thought

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

DialToM: A Theory of Mind Benchmark for Forecasting State-Driven Dialogue Trajectories

Related Papers