Search papers, labs, and topics across Lattice.
The paper introduces AfrIFact, a new dataset for automatic fact-checking in ten African languages and English, covering information retrieval, evidence extraction, and fact-checking. Experiments reveal limitations in cross-lingual retrieval capabilities of embedding models and highlight the difficulty of retrieving healthcare-domain documents compared to cultural or news documents. The study demonstrates that while LLMs struggle with multilingual fact verification in African languages, few-shot prompting and task-specific fine-tuning significantly improve performance, particularly with the AfriqueQwen-14B model.
LLMs' fact-checking abilities in African languages are surprisingly weak, but can be boosted by up to 43% with few-shot prompting and 26% with fine-tuning.
Assessing the veracity of a claim made online is a complex and important task with real-world implications. When these claims are directed at communities with limited access to information and the content concerns issues such as healthcare and culture, the consequences intensify, especially in low-resource languages. In this work, we introduce AfrIFact, a dataset that covers the necessary steps for automatic fact-checking (i.e., information retrieval, evidence extraction, and fact checking), in ten African languages and English. Our evaluation results show that even the best embedding models lack cross-lingual retrieval capabilities, and that cultural and news documents are easier to retrieve than healthcare-domain documents, both in large corpora and in single documents. We show that LLMs lack robust multilingual fact-verification capabilities in African languages, while few-shot prompting improves performance by up to 43% in AfriqueQwen-14B, and task-specific fine-tuning further improves fact-checking accuracy by up to 26%. These findings, along with our release of the AfrIFact dataset, encourage work on low-resource information retrieval, evidence retrieval, and fact checking.