Apr 12, 2026arXiv:2604.10460

Toward Accountable AI-Generated Content on Social Platforms: Steganographic Attribution and Multimodal Harm Detection

Xinlei Guan, David Arosemena, Tejaswi Dhandu, Kuan Huang, Meng Xu, Miles Q. Li, Bingyu Shen, Ruiyang Qin, Umamaheswara Rao Tida, Boyang Li

AI Summary

This paper introduces a steganography-based attribution framework to trace AI-generated images used in harmful multimodal contexts. The framework embeds cryptographically signed identifiers into images using watermarking and employs a CLIP-based model for detecting harmful image-text combinations. Experiments show that wavelet-domain spread-spectrum watermarking is robust to distortions, and the multimodal detector achieves high accuracy (AUC-ROC of 0.99), enabling reliable attribution.

Key Contribution

Spotting AI-generated harms requires more than just image analysis: this system links images to their source even when paired with harmful text, enabling accountability.

Abstract

The rapid growth of generative AI has introduced new challenges in content moderation and digital forensics. In particular, benign AI-generated images can be paired with harmful or misleading text, creating difficult-to-detect misuse. This contextual misuse undermines the traditional moderation framework and complicates attribution, as synthetic images typically lack persistent metadata or device signatures. We introduce a steganography enabled attribution framework that embeds cryptographically signed identifiers into images at creation time and uses multimodal harmful content detection as a trigger for attribution verification. Our system evaluates five watermarking methods across spatial, frequency, and wavelet domains. It also integrates a CLIP-based fusion model for multimodal harmful-content detection. Experiments demonstrate that spread-spectrum watermarking, especially in the wavelet domain, provides strong robustness under blur distortions, and our multimodal fusion detector achieves an AUC-ROC of 0.99, enabling reliable cross-modal attribution verification. These components form an end-to-end forensic pipeline that enables reliable tracing of harmful deployments of AI-generated imagery, supporting accountability in modern synthetic media environments. Our code is available at GitHub: https://github.com/bli1/steganography

Constitutional AI & AI Ethics Multimodal Models Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Toward Accountable AI-Generated Content on Social Platforms: Steganographic Attribution and Multimodal Harm Detection

Related Papers