Search papers, labs, and topics across Lattice.
This paper investigates the ability of LLMs to detect language ideologies in Luxembourgish news comments, a valuable task for understanding identity construction in multilingual societies. A manually annotated corpus of Luxembourgish user comments with predefined ideological categories was created to evaluate LLM performance under varying prompt conditions. Results indicate that while LLMs are not yet fully optimized for multi-class ideological annotation, they can effectively identify language ideological content, even when trained on a small language.
LLMs can identify language ideologies even in low-resource languages like Luxembourgish, offering a new tool for understanding identity construction in multilingual societies.
Detecting language ideologies is a valuable yet complex task for understanding how identities are constructed through discourse. In Luxembourg's multicultural and multilingual society, language ideologies reflect more than simple preferences: they carry deep cultural and social meanings, shaping identities and social belonging. Following recent developments in applying Natural Language Processing tools to linguistics and social science, this paper explores the potential of large language models to assist in the detection of language ideologies. We manually annotate a corpus of user comments in Luxembourgish with predefined ideological categories and then evaluate the performance of large language models under varying prompt conditions to assess their ability to replicate these human annotations. Since Luxembourgish is a small language and poorly represented in the LLMs'training data, we also investigate whether machine-translating the data to high-resource languages increases performance on the ideology detection task. Our findings suggest that, while LLMs are not yet fully optimized for a multi-class ideological annotation task, they are practical tools to identify language ideological content.