Search papers, labs, and topics across Lattice.
The paper introduces the Distributed Partial Information Puzzle (DPIP), a collaborative task designed to elicit multimodal communication and examine common ground construction under epistemic asymmetry. A multimodal dataset of these interactions, annotated across speech, gesture, and action, is presented to support reasoning about propositional content and belief dynamics. Evaluation of LLMs and a Dynamic Epistemic Logic (DEL) pipeline on the DPIP data reveals that LLMs struggle to track task progression and belief state in this context.
LLMs falter at tracking shared beliefs and task progression in a novel multimodal collaboration task, highlighting a gap in their ability to reason under epistemic asymmetry.
Establishing common ground, a shared set of beliefs and mutually recognized facts, is fundamental to collaboration, yet remains a challenge for current AI systems, especially in multimodal, multiparty settings, where the collaborators bring different information to the table. We introduce the Distributed Partial Information Puzzle (DPIP), a collaborative construction task that elicits rich multimodal communication under epistemic asymmetry. We present a multimodal dataset of these interactions, annotated and temporally aligned across speech, gesture, and action modalities to support reasoning over propositional content and belief dynamics. We then evaluate two paradigms for modeling common ground (CG): (1) state-of-the-art large language models (LLMs), prompted to infer shared beliefs from multimodal updates, and (2) an axiomatic pipeline grounded in Dynamic Epistemic Logic (DEL) that incrementally performs the same task. Results on the annotated DPIP data indicate that it poses a challenge to modern LLMs'abilities to track both task progression and belief state.