Mar 30, 2026arXiv:2603.28043

Seeing the Unseen: Rethinking Illicit Promotion Detection with In-Context Learning

AI Summary

This paper explores In-Context Learning (ICL) as a unified framework for detecting illicit online promotions, demonstrating its ability to generalize to unseen threats, autonomously discover new illicit categories, and generalize across platforms. The study shows that ICL, when properly configured, achieves performance comparable to fine-tuned models with significantly fewer labeled examples (22x reduction). Key results include the discovery of eight previously undocumented illicit categories and a 92.6% accuracy rate when deployed on real-world samples from search engines and Twitter without adaptation.

Key Contribution

Forget retraining: In-Context Learning lets you detect novel online scams and illicit content with near fine-tuned performance, zero-shot across platforms.

Abstract

Illicit online promotion is a persistent threat that evolves to evade detection. Existing moderation systems remain tethered to platform-specific supervision and static taxonomies, a reactive paradigm that struggles to generalize across domains or uncover novel threats. This paper presents a systematic study of In-Context Learning (ICL) as a unified framework for illicit promotion detection. Through rigorous analysis, we show that properly configured ICL achieves performance comparable to fine-tuned models using 22x fewer labeled examples. We demonstrate three key capabilities: (1) Generalization to unseen threats: ICL generalizes to new illicit categories without category-specific demonstrations, with a performance drop of less than 6% for most evaluated categories. (2) Autonomous discovery: A novel two-stage pipeline distills 2,900 free-form labels into coherent taxonomies, surfacing eight previously undocumented illicit categories such as usury and illegal immigration. (3) Cross-platform generalization: Deployed on 200,000 real-world samples from search engines and Twitter without adaptation, ICL achieves 92.6% accuracy. Furthermore, 61.8% of its uniquely flagged samples correspond to borderline or obfuscated content missed by existing detectors. Our findings position ICL as a new paradigm for content moderation, combining the precision of specialized classifiers with cross-platform generalization and autonomous threat discovery. By shifting to inference-time reasoning, ICL offers a path toward proactively adaptive moderation systems.

Natural Language Processing Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Seeing the Unseen: Rethinking Illicit Promotion Detection with In-Context Learning

Related Papers