Feb 26, 2026arXiv:2602.22752

Towards Simulating Social Media Users with LLMs: Evaluating the Operational Validity of Conditioned Comment Prediction

Nils Schwager, S. Munker, Simon Münker, Alistair Plum, Achim Rettinger

AI Summary

This paper introduces Conditioned Comment Prediction (CCP), a framework for evaluating how well LLMs can simulate social media user behavior by predicting comments conditioned on stimuli and comparing them to real user data. The study evaluates open-weight 8B models (Llama3.1, Qwen3, Ministral) in English, German, and Luxembourgish, finding a trade-off between form and content in low-resource settings when using Supervised Fine-Tuning (SFT). The results suggest that authentic behavioral traces are more effective than descriptive personas for high-fidelity user simulation, challenging current prompting strategies.

Key Contribution

Fine-tuning LLMs on social media data can improve the superficial style of generated comments but simultaneously degrade their semantic accuracy, revealing a critical disconnect between form and content.

Abstract

The transition of Large Language Models (LLMs) from exploratory tools to active"silicon subjects"in social science lacks extensive validation of operational validity. This study introduces Conditioned Comment Prediction (CCP), a task in which a model predicts how a user would comment on a given stimulus by comparing generated outputs with authentic digital traces. This framework enables a rigorous evaluation of current LLM capabilities with respect to the simulation of social media user behavior. We evaluated open-weight 8B models (Llama3.1, Qwen3, Ministral) in English, German, and Luxembourgish language scenarios. By systematically comparing prompting strategies (explicit vs. implicit) and the impact of Supervised Fine-Tuning (SFT), we identify a critical form vs. content decoupling in low-resource settings: while SFT aligns the surface structure of the text output (length and syntax), it degrades semantic grounding. Furthermore, we demonstrate that explicit conditioning (generated biographies) becomes redundant under fine-tuning, as models successfully perform latent inference directly from behavioral histories. Our findings challenge current"naive prompting"paradigms and offer operational guidelines prioritizing authentic behavioral traces over descriptive personas for high-fidelity simulation.

Eval Frameworks & Benchmarks Natural Language Processing Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References22

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Towards Simulating Social Media Users with LLMs: Evaluating the Operational Validity of Conditioned Comment Prediction

Related Papers