Search papers, labs, and topics across Lattice.
This paper introduces the GPT-Image-2 Twitter Dataset, comprising 10,217 images generated by GPT-Image-2 and shared on Twitter within the first week of its release. The dataset was curated using a multi-stage pipeline involving text heuristics, "Made with AI" badge verification, and model name matching. Analysis reveals high rates of text legibility and face presence, while also highlighting the systematic removal of C2PA content credentials by Twitter, preventing provenance verification.
Twitter strips C2PA provenance data from AI-generated images, making it impossible to cryptographically verify their origin on the platform.
The release of GPT-image-2 by OpenAI marks a watershed moment in AI-generated imagery: the boundary between photographic reality and synthetic content has never been more difficult to discern. We introduce the GPT-Image-2 Twitter Dataset, the first published dataset of GPT-image-2 generated images, sourced from publicly available Twitter/X posts in the immediate aftermath of the model's April 21, 2026 release. Leveraging the Twitter API v2 and a multi-stage curation pipeline spanning multilingual text heuristics (English, Japanese, and Chinese), browser-automated Twitter"Made with AI"badge verification, and model name variant matching, we curate 10,217 confirmed GPT-image-2 images from 27,662 collected records over a six-day window. We characterize the dataset across four analyses: CLIP-based zero-shot subject taxonomy, OCR text legibility (82.0% of images contain detectable text), face detection (59.2% of images, 22,583 total faces), and semantic clustering (137 CLIP ViT-L/14 clusters). A key negative result is that C2PA content credentials are systematically stripped by Twitter's CDN on upload, rendering cryptographic provenance verification infeasible for social-media-sourced AI images. The dataset and all curation code are released publicly.