Search papers, labs, and topics across Lattice.
The paper introduces AIForge-Doc, a new benchmark dataset designed to evaluate the detection of AI-forged tampering in financial and form documents, specifically focusing on diffusion-model-based inpainting. The dataset comprises 4,061 forged images generated using Gemini 2.5 Flash Image and Ideogram v2 Edit on four public document datasets, with pixel-level annotations of tampered regions. Benchmarking existing detectors (TruFor, DocTamper, GPT-4o) reveals a significant performance drop compared to traditional forgery detection, highlighting the challenge AI-forged documents pose to current forensic methods.
Existing document forgery detectors are essentially useless against the rising tide of AI-generated document fraud, as revealed by a new benchmark showing a massive performance drop.
We present AIForge-Doc, the first dedicated benchmark targeting exclusively diffusion-model-based inpainting in financial and form documents with pixel-level annotation. Existing document forgery datasets rely on traditional digital editing tools (e.g., Adobe Photoshop, GIMP), creating a critical gap: state-of-the-art detectors are blind to the rapidly growing threat of AI-forged document fraud. AIForge-Doc addresses this gap by systematically forging numeric fields in real-world receipt and form images using two AI inpainting APIs -- Gemini 2.5 Flash Image and Ideogram v2 Edit -- yielding 4,061 forged images from four public document datasets (CORD, WildReceipt, SROIE, XFUND) across nine languages, annotated with pixel-precise tampered-region masks in DocTamper-compatible format. We benchmark three representative detectors -- TruFor, DocTamper, and a zero-shot GPT-4o judge -- and find that all existing methods degrade substantially: TruFor achieves AUC=0.751 (zero-shot, out-of-distribution) vs. AUC=0.96 on NIST16; DocTamper achieves AUC=0.563 vs. AUC=0.98 in-distribution, with pixel-level IoU=0.020; GPT-4o achieves only 0.509 -- essentially at chance -- confirming that AI-forged values are indistinguishable to automated detectors and VLMs. These results demonstrate that AIForge-Doc represents a qualitatively new and unsolved challenge for document forensics.