DeepMindSUTDMar 17, 2026arXiv:2603.16070

SEAHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Southeast Asia

Riggs Ng, Aditi Kumaresan, Roy Ka-Wei Lee

AI Summary

SEAHateCheck, a new functional test suite, was created to evaluate hate speech detection models in Indonesian, Tagalog, Thai, and Vietnamese. The dataset builds upon HateCheck and SGHateCheck, incorporating culturally relevant test cases generated with LLMs and validated by local experts. Experiments using state-of-the-art models revealed significant limitations in detecting hate speech, particularly in Tagalog and slang-based expressions, highlighting challenges in low-resource languages.

Key Contribution

Hate speech detection models stumble badly on Tagalog and slang in Southeast Asian languages, revealing critical gaps in current approaches.

Abstract

Hate speech detection relies heavily on linguistic resources, which are primarily available in high-resource languages such as English and Chinese, creating barriers for researchers and platforms developing tools for low-resource languages in Southeast Asia, where diverse socio-linguistic contexts complicate online hate moderation. To address this, we introduce SEAHateCheck, a pioneering dataset tailored to Indonesia, Thailand, the Philippines, and Vietnam, covering Indonesian, Tagalog, Thai, and Vietnamese. Building on HateCheck's functional testing framework and refining SGHateCheck's methods, SEAHateCheck provides culturally relevant test cases, augmented by large language models and validated by local experts for accuracy. Experiments with state-of-the-art and multilingual models revealed limitations in detecting hate speech in specific low-resource languages. In particular, Tagalog test cases showed the lowest model accuracy, likely due to linguistic complexity and limited training data. In contrast, slang-based functional tests proved the hardest, as models struggled with culturally nuanced expressions. The diagnostic insights of SEAHateCheck further exposed model weaknesses in implicit hate detection and models'struggles with counter-speech expression. As the first functional test suite for these Southeast Asian languages, this work equips researchers with a robust benchmark, advancing the development of practical, culturally attuned hate speech detection tools for inclusive online content moderation.

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SEAHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Southeast Asia

Related Papers