Search papers, labs, and topics across Lattice.
This paper introduces a new multimodal, multilingual benchmark dataset, HarmHumor, designed to evaluate the ability of AI models to detect harmful and offensive humor, distinguishing between safe, explicit, and implicit (covert) categories. The dataset includes 3,000 texts and 6,000 images in English and Arabic, as well as 1,200 videos in English, Arabic, and language-independent contexts. Evaluations of SOTA models on HarmHumor reveal a significant performance gap between closed-source and open-source models, as well as between English and Arabic, highlighting the need for culturally sensitive safety alignment.
Current AI safety filters can't tell a joke from a threat, especially when humor relies on cultural context – this new benchmark exposes that blind spot.
Dark humor often relies on subtle cultural nuances and implicit cues that require contextual reasoning to interpret, posing safety challenges that current static benchmarks fail to capture. To address this, we introduce a novel multimodal, multilingual benchmark for detecting and understanding harmful and offensive humor. Our manually curated dataset comprises 3,000 texts and 6,000 images in English and Arabic, alongside 1,200 videos that span English, Arabic, and language-independent (universal) contexts. Unlike standard toxicity datasets, we enforce a strict annotation guideline: distinguishing \emph{Safe} jokes from \emph{Harmful} ones, with the latter further classified into \emph{Explicit} (overt) and \emph{Implicit} (Covert) categories to probe deep reasoning. We systematically evaluate state-of-the-art (SOTA) open and closed-source models across all modalities. Our findings reveal that closed-source models significantly outperform open-source ones, with a notable difference in performance between the English and Arabic languages in both, underscoring the critical need for culturally grounded, reasoning-aware safety alignment. \textcolor{red}{Warning: this paper contains example data that may be offensive, harmful, or biased.}