Apr 23, 2026arXiv:2604.21786

From Codebooks to VLMs: Evaluating Automated Visual Discourse Analysis for Climate Change on Social Media

Katharina Prasse, Steffen Jung, Isaac Bravo, Stefanie Walter, Patrick Knab, Christian Bartelt, Margret Keuper

AI Summary

This paper investigates the use of computer vision methods, specifically vision-language models (VLMs) and CLIP-like models, for automated social media discourse analysis related to climate change. They benchmarked six promptable VLMs and 15 zero-shot CLIP models on two Twitter image datasets, evaluating performance across five annotation dimensions relevant to climate communication. The key finding is that while Gemini-3.1-flash-lite performs best, distributional evaluation shows VLMs can reliably recover population-level trends even with moderate per-image accuracy, making them useful for large-scale discourse analysis.

Key Contribution

VLMs can reliably reveal population-level trends in climate change discourse on social media, even when per-image accuracy is only moderate.

Abstract

Social media platforms have become primary arenas for climate communication, generating millions of images and posts that - if systematically analysed - can reveal which communication strategies mobilise public concern and which fall flat. We aim to facilitate such research by analysing how computer vision methods can be used for social media discourse analysis. This analysis includes application-based taxonomy design, model selection, prompt engineering, and validation. We benchmark six promptable vision-language models and 15 zero-shot CLIP-like models on two datasets from X (formerly Twitter) - a 1,038-image expert-annotated set and a larger corpus of over 1.2 million images, with 50,000 labels manually validated - spanning five annotation dimensions: animal content, climate change consequences, climate action, image setting, and image type. Among the models benchmarked, Gemini-3.1-flash-lite outperforms all others across all super-categories and both datasets, while the gap to open-weight models of moderate size remains relatively small. Beyond instance-level metrics, we advocate for distributional evaluation: VLM predictions can reliably recover population level trends even when per-image accuracy is moderate, making them a viable starting point for discourse analysis at scale. We find that chain-of-thought reasoning reduces rather than improves performance, and that annotation dimension specific prompt design improves performance. We release tweet IDs and labels along with our code at https://github.com/KathPra/Codebooks2VLMs.git.

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

Citation Metrics

Citations0

Influential citations0

References69

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

From Codebooks to VLMs: Evaluating Automated Visual Discourse Analysis for Climate Change on Social Media

Related Papers