Search papers, labs, and topics across Lattice.
This paper introduces DemosQA, a new Greek question answering dataset derived from social media to better represent Greek social and cultural context. The authors evaluate 11 monolingual and multilingual LLMs on 6 Greek QA datasets using a memory-efficient evaluation framework and three prompting strategies. Results show that performance varies significantly across models and datasets, highlighting the need for more targeted development and evaluation for under-resourced languages like Greek.
Monolingual LLMs can outperform multilingual models on culturally-specific QA, challenging the assumption that multilingual models are universally superior.
Recent advancements in Natural Language Processing and Deep Learning have enabled the development of Large Language Models (LLMs), which have significantly advanced the state-of-the-art across a wide range of tasks, including Question Answering (QA). Despite these advancements, research on LLMs has primarily targeted high-resourced languages (e.g., English), and only recently has attention shifted toward multilingual models. However, these models demonstrate a training data bias towards a small number of popular languages or rely on transfer learning from high- to under-resourced languages; this may lead to a misrepresentation of social, cultural, and historical aspects. To address this challenge, monolingual LLMs have been developed for under-resourced languages; however, their effectiveness remains less studied when compared to multilingual counterparts on language-specific tasks. In this study, we address this research gap in Greek QA by contributing: (i) DemosQA, a novel dataset, which is constructed using social media user questions and community-reviewed answers to better capture the Greek social and cultural zeitgeist; (ii) a memory-efficient LLM evaluation framework adaptable to diverse QA datasets and languages; and (iii) an extensive evaluation of 11 monolingual and multilingual LLMs on 6 human-curated Greek QA datasets using 3 different prompting strategies. We release our code and data to facilitate reproducibility.