NASK National Research InstitutePolish-Japanese Academy of Information TechnologyUniversity of PaduaMar 15, 2026arXiv:2603.14525

MALicious INTent Dataset and Inoculating LLMs for Enhanced Disinformation Detection

Arkadiusz Modzelewski, Witold Sosnowski, Eleni Papadopulos, Elisa Sartori, Tiziano Labruna, Giovanni Da San Martino, Adam Wierzbicki

AI Summary

The authors introduce MALINT, a new human-annotated English corpus designed to capture disinformation and its malicious intent, created in collaboration with expert fact-checkers. They benchmarked 12 language models, including SLMs and LLMs, on binary and multilabel intent classification tasks using MALINT. They then propose and evaluate intent-based inoculation, an intent-augmented reasoning approach for LLMs, demonstrating improved zero-shot disinformation detection across six datasets, five LLMs, and seven languages.

Key Contribution

Knowing the *intent* behind disinformation can significantly improve LLMs' ability to detect it, paving the way for more robust defenses against malicious narratives.

Abstract

The intentional creation and spread of disinformation poses a significant threat to public discourse. However, existing English datasets and research rarely address the intentionality behind the disinformation. This work presents MALINT, the first human-annotated English corpus developed in collaboration with expert fact-checkers to capture disinformation and its malicious intent. We utilize our novel corpus to benchmark 12 language models, including small language models (SLMs) such as BERT and large language models (LLMs) like Llama 3.3, on binary and multilabel intent classification tasks. Moreover, inspired by inoculation theory from psychology and communication studies, we investigate whether incorporating knowledge of malicious intent can improve disinformation detection. To this end, we propose intent-based inoculation, an intent-augmented reasoning for LLMs that integrates intent analysis to mitigate the persuasive impact of disinformation. Analysis on six disinformation datasets, five LLMs, and seven languages shows that intent-augmented reasoning improves zero-shot disinformation detection. To support research in intent-aware disinformation detection, we release the MALINT dataset with annotations from each annotation step.

Eval Frameworks & Benchmarks Natural Language Processing Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

MALicious INTent Dataset and Inoculating LLMs for Enhanced Disinformation Detection

Related Papers