Apr 21, 2026arXiv:2604.18955

Assessing Capabilities of Large Language Models in Social Media Analytics: A Multi-task Quest

R. Davoudi, Kartik Thakkar, Nazanin Donyapour, Tyler Derr, Hamid Karimi

AI Summary

This paper benchmarks the performance of several state-of-the-art LLMs (GPT-4, GPT-4o, Gemini 1.5 Pro, DeepSeek-V3, Llama 3.2, and BERT) on three social media analytics tasks: authorship verification, post generation, and user attribute inference, using a Twitter dataset. The study introduces a systematic sampling framework for authorship verification to address "seen-data" bias and a user study to evaluate the perceived authenticity of LLM-generated posts. The results provide a comprehensive evaluation and reproducible benchmarks for LLM-driven social media analytics, highlighting the strengths and weaknesses of different models across these tasks.

Key Contribution

LLMs struggle to generate social media posts that real users perceive as authentic, even when conditioned on the user's own writing.

Abstract

In this study, we present the first comprehensive evaluation of modern LLMs - including GPT-4, GPT-4o, GPT-3.5-Turbo, Gemini 1.5 Pro, DeepSeek-V3, Llama 3.2, and BERT - across three core social media analytics tasks on a Twitter (X) dataset: (I) Social Media Authorship Verification, (II) Social Media Post Generation, and (III) User Attribute Inference. For the authorship verification, we introduce a systematic sampling framework over diverse user and post selection strategies and evaluate generalization on newly collected tweets from January 2024 onward to mitigate"seen-data"bias. For post generation, we assess the ability of LLMs to produce authentic, user-like content using comprehensive evaluation metrics. Bridging Tasks I and II, we conduct a user study to measure real users'perceptions of LLM-generated posts conditioned on their own writing. For attribute inference, we annotate occupations and interests using two standardized taxonomies (IAB Tech Lab 2023 and 2018 U.S. SOC) and benchmark LLMs against existing baselines. Overall, our unified evaluation provides new insights and establishes reproducible benchmarks for LLM-driven social media analytics. The code and data are provided in the supplementary material and will also be made publicly available upon publication.

Eval Frameworks & Benchmarks Natural Language Processing Open-Source Models & Weights

Citation Metrics

Citations0

Influential citations0

References57

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Assessing Capabilities of Large Language Models in Social Media Analytics: A Multi-task Quest

Related Papers