Search papers, labs, and topics across Lattice.
The paper introduces SNEAK, a benchmark to evaluate LLMs' ability to strategically communicate information to allies while concealing it from adversaries. SNEAK tasks LLMs with generating messages that signal knowledge of a secret word to an ally, while preventing a "chameleon" adversary from inferring the secret. Experiments using SNEAK reveal that current LLMs struggle to balance informativeness and secrecy, significantly underperforming human participants.
LLMs are surprisingly bad at strategic communication, leaking sensitive information even when trying to be secretive.
Large language models (LLMs) are increasingly deployed in multi-agent settings where communication must balance informativeness and secrecy. In such settings, an agent may need to signal information to collaborators while preventing an adversary from inferring sensitive details. However, existing LLM benchmarks primarily evaluate capabilities such as reasoning, factual knowledge, or instruction following, and do not directly measure strategic communication under asymmetric information. We introduce SNEAK (Secret-aware Natural language Evaluation for Adversarial Knowledge), a benchmark for evaluating selective information sharing in language models. In SNEAK, a model is given a semantic category, a candidate set of words, and a secret word, and must generate a message that indicates knowledge of the secret without revealing it too clearly. We evaluate generated messages using two simulated agents with different information states: an ally, who knows the secret and must identify the intended message, and a chameleon, who does not know the secret and attempts to infer it from the message. This yields two complementary metrics: utility, measuring how well the message communicates to collaborators, and leakage, measuring how much information it reveals to an adversary. Using this framework, we analyze the trade-off between informativeness and secrecy in modern language models and show that strategic communication under asymmetric information remains a challenging capability for current systems. Notably, human participants outperform all evaluated models by a large margin, achieving up to four times higher scores.