MilaDutch PoliceTwenteMar 2, 2026arXiv:2603.01624

Assessing Crime Disclosure Patterns in a Large-Scale Cybercrime Forum

Raphael Hoheisel, Raphael Hoheisel, Tom Meurs, Tom Meurs, Jai Wientjes, J. Wientjes, Marianne Junger, Marianne Junger, Abhishta Abhishta, Abhishta Abhishta, Masarah Paquet-Clouston, Masarah Paquet-Clouston

AI Summary

This paper investigates crime disclosure patterns within a large cybercrime forum by analyzing 3.5 million posts from nearly 300,000 users. They employed a three-level classification scheme (benign, grey, crime) and a scalable LLM-powered labeling pipeline to measure crime disclosure in initial posts, track transitions between disclosure levels, and correlate disclosure with private communications. The study reveals that explicit crime-related content is present in a significant portion of initial posts, but most users exhibit restraint and gradual escalation in their disclosures, with grey content being particularly prevalent.

Key Contribution

One in four initial posts on a major cybercrime forum contain explicit crime-related content, revealing a surprisingly high baseline of open criminal activity.

Abstract

Cybercrime forums play a central role in the cybercrime ecosystem, serving as hubs for the exchange of illicit goods, services, and knowledge. Previous studies have explored the market and social structures of these forums, but less is known about the behavioral dynamics of users, particularly regarding participants'disclosure of criminal activity. This study provides the first large-scale assessment of crime disclosure patterns in a major cybercrime forum, analysing over 3.5 million posts from nearly 300k users. Using a three-level classification scheme (benign, grey, and crime) and a scalable labelling pipeline powered by large language models (LLMs), we measure the level of crime disclosure present in initial posts, analyse how participants switch between levels, and assess how crime disclosure behavior relates to private communications. Our results show that crime disclosure is relatively normative: one quarter of initial posts include explicit crime-related content, and more than one third of users disclose criminal activity at least once in their initial posts. At the same time, most participants show restraint, with over two-thirds posting only benign or grey content and typically escalating disclosure gradually. Grey initial posts are particularly prominent, indicating that many users avoid overt statements and instead anchor their activity in ambiguous content. The study highlights the value of LLM-based text classification and Markov chain modelling for capturing crime disclosure patterns, offering insights for law enforcement efforts aimed at distinguishing benign, grey, and criminal content in cybercrime forums.

Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References49

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Assessing Crime Disclosure Patterns in a Large-Scale Cybercrime Forum

Related Papers