Search papers, labs, and topics across Lattice.
The authors introduce Yor-Sarc, the first gold-standard dataset for sarcasm detection in Yor\`{u}b\'{a}, a low-resource African language. They created a dataset of 436 instances annotated by three native speakers using a novel annotation protocol tailored for Yor\`{u}b\'{a} sarcasm, incorporating context and community-informed guidelines. The dataset achieves substantial to almost perfect inter-annotator agreement, demonstrating its quality and potential for semantic interpretation and culturally informed NLP research.
Sarcasm detection in Yor\`{u}b\'{a} just got a gold-standard dataset, boasting impressive inter-annotator agreement and paving the way for culturally nuanced NLP in African languages.
Sarcasm detection poses a fundamental challenge in computational semantics, requiring models to resolve disparities between literal and intended meaning. The challenge is amplified in low-resource languages where annotated datasets are scarce or nonexistent. We present \textbf{Yor-Sarc}, the first gold-standard dataset for sarcasm detection in Yor\`{u}b\'{a}, a tonal Niger-Congo language spoken by over $50$ million people. The dataset comprises 436 instances annotated by three native speakers from diverse dialectal backgrounds using an annotation protocol specifically designed for Yor\`{u}b\'{a} sarcasm by taking culture into account. This protocol incorporates context-sensitive interpretation and community-informed guidelines and is accompanied by a comprehensive analysis of inter-annotator agreement to support replication in other African languages. Substantial to almost perfect agreement was achieved (Fleiss'$\kappa = 0.7660$; pairwise Cohen's $\kappa = 0.6732$--$0.8743$), with $83.3\%$ unanimous consensus. One annotator pair achieved almost perfect agreement ($\kappa = 0.8743$; $93.8\%$ raw agreement), exceeding a number of reported benchmarks for English sarcasm research works. The remaining $16.7\%$ majority-agreement cases are preserved as soft labels for uncertainty-aware modelling. Yor-Sarc\footnote{https://github.com/toheebadura/yor-sarc} is expected to facilitate research on semantic interpretation and culturally informed NLP for low-resource African languages.