MBZUAIMar 18, 2026arXiv:2603.17759

Harm or Humor: A Multimodal, Multilingual Benchmark for Overt and Covert Harmful Humor

Ahmed Sharshar, Hosam Elgendy, Saad El Dine Ahmed, Yasser Rohaim, Yuxia Wang

AI Summary

This paper introduces a new multimodal, multilingual benchmark dataset, HarmHumor, designed to evaluate the ability of AI models to detect harmful and offensive humor, distinguishing between safe, explicit, and implicit (covert) categories. The dataset includes 3,000 texts and 6,000 images in English and Arabic, as well as 1,200 videos in English, Arabic, and language-independent contexts. Evaluations of SOTA models on HarmHumor reveal a significant performance gap between closed-source and open-source models, as well as between English and Arabic, highlighting the need for culturally sensitive safety alignment.

Key Contribution

Current AI safety filters can't tell a joke from a threat, especially when humor relies on cultural context – this new benchmark exposes that blind spot.

Abstract

Dark humor often relies on subtle cultural nuances and implicit cues that require contextual reasoning to interpret, posing safety challenges that current static benchmarks fail to capture. To address this, we introduce a novel multimodal, multilingual benchmark for detecting and understanding harmful and offensive humor. Our manually curated dataset comprises 3,000 texts and 6,000 images in English and Arabic, alongside 1,200 videos that span English, Arabic, and language-independent (universal) contexts. Unlike standard toxicity datasets, we enforce a strict annotation guideline: distinguishing \emph{Safe} jokes from \emph{Harmful} ones, with the latter further classified into \emph{Explicit} (overt) and \emph{Implicit} (Covert) categories to probe deep reasoning. We systematically evaluate state-of-the-art (SOTA) open and closed-source models across all modalities. Our findings reveal that closed-source models significantly outperform open-source ones, with a notable difference in performance between the English and Arabic languages in both, underscoring the critical need for culturally grounded, reasoning-aware safety alignment. \textcolor{red}{Warning: this paper contains example data that may be offensive, harmful, or biased.}

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Multimodal Models Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References53

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Harm or Humor: A Multimodal, Multilingual Benchmark for Overt and Covert Harmful Humor

Related Papers