Search papers, labs, and topics across Lattice.
This paper introduces four new evaluation suites to systematically study LLMs' ability to use and understand subtext in communicative settings, ranging from allegory interpretation to multi-agent games. The authors find that state-of-the-art LLMs exhibit a strong bias towards literal communication, struggling to account for nuanced constraints and common ground. However, performance improves when common ground is explicitly provided, and paratextual cues influence allegory interpretation, highlighting both the limitations and potential of LLMs in understanding subtext.
LLMs' struggle to grasp subtext鈥攅ven generating literal clues 60% of the time鈥攔eveals a critical gap in their ability to understand nuanced human communication.
Human communication is fundamentally creative, and often makes use of subtext -- implied meaning that goes beyond the literal content of the text. Here, we systematically study whether language models can use subtext in communicative settings, and introduce four new evaluation suites to assess these capabilities. Our evaluation settings range from writing&interpreting allegories to playing multi-agent and multi-modal games inspired by the rules of board games like Dixit. We find that frontier models generally exhibit a strong bias towards overly literal, explicit communication, and thereby fail to account for nuanced constraints -- even the best performing models generate literal clues 60% of times in one of our environments -- Visual Allusions. However, we find that some models can sometimes make use of common ground with another party to help them communicate with subtext, achieving 30%-50% reduction in overly literal clues; but they struggle at inferring presence of a common ground when not explicitly stated. For allegory understanding, we find paratextual and persona conditions to significantly shift the interpretation of subtext. Overall, our work provides quantifiable measures for an inherently complex and subjective phenomenon like subtext and reveals many weaknesses and idiosyncrasies of current LLMs. We hope this research to inspire future work towards socially grounded creative communication and reasoning.