DeepMindVrije Universiteit AmsterdamApr 7, 2026arXiv:2604.05273

Beneath the Surface: Investigating LLMs'Capabilities for Communicating with Subtext

Kabir Ahuja, Kabir Ahuja, Yuxuan Li, Andrew K. Lampinen, Andrew Kyle Lampinen

AI Summary

This paper introduces four new evaluation suites to systematically study LLMs' ability to use and understand subtext in communicative settings, ranging from allegory interpretation to multi-agent games. The authors find that state-of-the-art LLMs exhibit a strong bias towards literal communication, struggling to account for nuanced constraints and common ground. However, performance improves when common ground is explicitly provided, and paratextual cues influence allegory interpretation, highlighting both the limitations and potential of LLMs in understanding subtext.

Key Contribution

LLMs' struggle to grasp subtext—even generating literal clues 60% of the time—reveals a critical gap in their ability to understand nuanced human communication.

Abstract

Human communication is fundamentally creative, and often makes use of subtext -- implied meaning that goes beyond the literal content of the text. Here, we systematically study whether language models can use subtext in communicative settings, and introduce four new evaluation suites to assess these capabilities. Our evaluation settings range from writing&interpreting allegories to playing multi-agent and multi-modal games inspired by the rules of board games like Dixit. We find that frontier models generally exhibit a strong bias towards overly literal, explicit communication, and thereby fail to account for nuanced constraints -- even the best performing models generate literal clues 60% of times in one of our environments -- Visual Allusions. However, we find that some models can sometimes make use of common ground with another party to help them communicate with subtext, achieving 30%-50% reduction in overly literal clues; but they struggle at inferring presence of a common ground when not explicitly stated. For allegory understanding, we find paratextual and persona conditions to significantly shift the interpretation of subtext. Overall, our work provides quantifiable measures for an inherently complex and subjective phenomenon like subtext and reveals many weaknesses and idiosyncrasies of current LLMs. We hope this research to inspire future work towards socially grounded creative communication and reasoning.

Eval Frameworks & Benchmarks Multimodal Models Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Beneath the Surface: Investigating LLMs'Capabilities for Communicating with Subtext

Related Papers