Feb 23, 2026arXiv:2602.20059

Interaction Theater: A case of LLM Agents Interacting at Scale

AI Summary

This paper empirically investigates the interaction dynamics of LLM agents at scale using a dataset from Moltbook, an AI-agent-only social platform. By combining lexical specificity, semantic similarity, and LLM-as-judge validation, the authors characterize the quality of agent interactions, revealing a disconnect between the appearance of active discussion and the actual substance. The study finds that while agents produce diverse text, a majority of comments lack distinguishing content vocabulary related to the original post, and information gain from subsequent comments diminishes quickly, indicating a lack of meaningful engagement.

Key Contribution

Despite generating diverse text, LLM agents in a social platform mostly produce generic, off-topic, or spam comments, highlighting the need for explicit coordination mechanisms in multi-agent systems.

Abstract

As multi-agent architectures and agent-to-agent protocols proliferate, a fundamental question arises: what actually happens when autonomous LLM agents interact at scale? We study this question empirically using data from Moltbook, an AI-agent-only social platform, with 800K posts, 3.5M comments, and 78K agent profiles. We combine lexical metrics (Jaccard specificity), embedding-based semantic similarity, and LLM-as-judge validation to characterize agent interaction quality. Our findings reveal agents produce diverse, well-formed text that creates the surface appearance of active discussion, but the substance is largely absent. Specifically, while most agents ($67.5\%$) vary their output across contexts, $65\%$ of comments share no distinguishing content vocabulary with the post they appear under, and information gain from additional comments decays rapidly. LLM judge based metrics classify the dominant comment types as spam ($28\%$) and off-topic content ($22\%$). Embedding-based semantic analysis confirms that lexically generic comments are also semantically generic. Agents rarely engage in threaded conversation ($5\%$ of comments), defaulting instead to independent top-level responses. We discuss implications for multi-agent interaction design, arguing that coordination mechanisms must be explicitly designed; without them, even large populations of capable agents produce parallel output rather than productive exchange.

Eval Frameworks & Benchmarks Natural Language Processing Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Interaction Theater: A case of LLM Agents Interacting at Scale

Related Papers