Search papers, labs, and topics across Lattice.
This paper introduces the first NLP dataset for Meenzerisch, a dying German dialect, derived from a 1966 dictionary and containing 2,351 dialect words paired with Standard German definitions. The authors then evaluate the ability of state-of-the-art large language models (LLMs) to generate definitions for dialect words and generate dialect words from definitions. Results show that LLMs perform poorly on both tasks, achieving less than 10% accuracy even with few-shot learning and rule extraction, underscoring the need for more resources and research on German dialects.
LLMs fail spectacularly at understanding and generating Meenzerisch, a German dialect, achieving less than 10% accuracy even with targeted interventions, revealing a significant gap in their linguistic capabilities for low-resource languages.
Meenzerisch, the dialect spoken in the German city of Mainz, is also the traditional language of the Mainz carnival, a yearly celebration well known throughout Germany. However, Meenzerisch is on the verge of dying out-a fate it shares with many other German dialects. Natural language processing (NLP) has the potential to help with the preservation and revival efforts of languages and dialects. However, so far no NLP research has looked at Meenzerisch. This work presents the first research in the field of NLP that is explicitly focused on the dialect of Mainz. We introduce a digital dictionary-an NLP-ready dataset derived from an existing resource (Schramm, 1966)-to support researchers in modeling and benchmarking the language. It contains 2,351 words in the dialect paired with their meanings described in Standard German. We then use this dataset to answer the following research questions: (1) Can state-of-the-art large language models (LLMs) generate definitions for dialect words? (2) Can LLMs generate words in Meenzerisch, given their definitions? Our experiments show that LLMs can do neither: the best model for definitions reaches only 6.27% accuracy and the best word generation model's accuracy is 1.51%. We then conduct two additional experiments in order to see if accuracy is improved by few-shot learning and by extracting rules from the training set, which are then passed to the LLM. While those approaches are able to improve the results, accuracy remains below 10%. This highlights that additional resources and an intensification of research efforts focused on German dialects are desperately needed.