Apr 28, 2026arXiv:2604.25370

GPT-Image-2 in the Wild: A Twitter Dataset of Self-Reported AI-Generated Images from the First Week of Deployment

Kidus Zewde, Kidus Zewde, Simiao Ren, Xingyu Shen, Jenny Wu, Yuchen Zhou, Tommy Duong, Tommy Duong, Zikang Zhang, Ethan Traister, Ethan Traister

AI Summary

This paper introduces the GPT-Image-2 Twitter Dataset, comprising 10,217 images generated by GPT-Image-2 and shared on Twitter within the first week of its release. The dataset was curated using a multi-stage pipeline involving text heuristics, "Made with AI" badge verification, and model name matching. Analysis reveals high rates of text legibility and face presence, while also highlighting the systematic removal of C2PA content credentials by Twitter, preventing provenance verification.

Key Contribution

Twitter strips C2PA provenance data from AI-generated images, making it impossible to cryptographically verify their origin on the platform.

Abstract

The release of GPT-image-2 by OpenAI marks a watershed moment in AI-generated imagery: the boundary between photographic reality and synthetic content has never been more difficult to discern. We introduce the GPT-Image-2 Twitter Dataset, the first published dataset of GPT-image-2 generated images, sourced from publicly available Twitter/X posts in the immediate aftermath of the model's April 21, 2026 release. Leveraging the Twitter API v2 and a multi-stage curation pipeline spanning multilingual text heuristics (English, Japanese, and Chinese), browser-automated Twitter"Made with AI"badge verification, and model name variant matching, we curate 10,217 confirmed GPT-image-2 images from 27,662 collected records over a six-day window. We characterize the dataset across four analyses: CLIP-based zero-shot subject taxonomy, OCR text legibility (82.0% of images contain detectable text), face detection (59.2% of images, 22,583 total faces), and semantic clustering (137 CLIP ViT-L/14 clusters). A key negative result is that C2PA content credentials are systematically stripped by Twitter's CDN on upload, rendering cryptographic provenance verification infeasible for social-media-sourced AI images. The dataset and all curation code are released publicly.

Computer Vision Data Curation & Synthetic Data Multimodal Models

Citation Metrics

Citations0

Influential citations0

References21

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

GPT-Image-2 in the Wild: A Twitter Dataset of Self-Reported AI-Generated Images from the First Week of Deployment

Related Papers