Search papers, labs, and topics across Lattice.
This paper introduces LoRA-based Pairwise Training (LPT), a finetuning strategy for visual foundation models to improve AIGI detection robustness under severe distortions. LPT incorporates distortion and size simulations during training to better match real-world data distributions and employs a pairwise training process to decouple generalization and robustness optimization. Experiments demonstrate that LPT achieves state-of-the-art performance in the NTIRE Robust AI-Generated Image Detection in the Wild challenge, indicating its effectiveness in handling complex, unpredictable distortions.
Finetuning visual foundation models with LoRA-based pairwise training dramatically improves AIGI detection robustness against real-world distortions.
The proliferation of highly realistic AI-Generated Image (AIGI) has necessitated the development of practical detection methods. While current AIGI detectors perform admirably on clean datasets, their detection performance frequently decreases when deployed "in the wild", where images are subjected to unpredictable, complex distortions. To resolve the critical vulnerability, we propose a novel LoRA-based Pairwise Training (LPT) strategy designed specifically to achieve robust detection for AIGI under severe distortions. The core of our strategy involves the targeted finetuning of a visual foundation model, the deliberate simulation of data distribution during the training phase, and a unique pairwise training process. Specifically, we introduce distortion and size simulations to better fit the distribution from the validation and test sets. Based on the strong visual representation capability of the visual foundation model, we finetune the model to achieve AIGI detection. The pairwise training is utilized to improve the detection via decoupling the generalization and robustness optimization. Experiments show that our approach secured the 3th placement in the NTIRE Robust AI-Generated Image Detection in the Wild challenge