MBZUAISofia University "St. Kliment Ohridski"Jun 9, 2026arXiv:2606.11316

Schützen: Evaluating LLM Safety in Bulgarian and German Contexts

Kiril Georgiev, Yuxia Wang, Dimitar Iliyanov Dimitrov, Preslav Nakov, Ivan Koychev

AI Summary

This paper introduces Schützen, a safety evaluation dataset specifically designed for assessing large language models (LLMs) in the Bulgarian and German contexts, addressing the existing bias towards English and Chinese in safety evaluations. Through experiments with multilingual and language-specific LLMs, the authors uncover significant cross-language differences in safety behavior, underscoring the importance of context-aware evaluation resources. The findings emphasize the need for tailored safety assessments to mitigate risks associated with LLM deployment in diverse sociocultural environments.

Key Contribution

Cross-language safety evaluations reveal that LLMs exhibit starkly different risk profiles in Bulgarian compared to German, challenging the notion of universal model safety.

Abstract

Large language models are increasingly deployed across professional domains, bringing hard-to-predict risks, including the generation of harmful or disrespectful content. Although substantial progress has been made in developing safety evaluation datasets, existing resources remain overwhelmingly English- and Chinese-centric. This limitation is particularly pronounced when evaluating languages that operate within shared sociocultural, legal, and ethical contexts. To address this gap, we introduce Schützen: a German--Bulgarian safety dataset designed to assess model answerability under risk, covering both a low-resource language (Bulgarian) and a high-resource language (German). Experiments with multilingual and language-specific LLMs reveal pronounced cross-language differences in safety behavior, highlighting the necessity of tailored, region-specific evaluation resources to support the responsible deployment of LLMs in Germany and Bulgaria. Datasets and code are available at https://github.com/xnlp-lab/Schutzen. Warning: this paper contains examples that may be offensive, harmful, or biased.

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Schützen: Evaluating LLM Safety in Bulgarian and German Contexts

Related Papers