Dec 15, 2025arXiv:2512.13298

MiniLingua: A Small Open-Source LLM for European Languages

Anna Aksenova, Boris Zverkov, Nicola Dainese, Alexander Nikitin, Pekka Marttinen

AI Summary

The paper introduces MiniLingua, a 1-billion parameter multilingual LLM trained from scratch on 13 European languages, addressing the limitations of larger, English-centric models. MiniLingua aims to balance language coverage with instruction-following capabilities in a smaller, more efficient model. The instruction-tuned version of MiniLingua outperforms EuroLLM on summarization, classification, and question answering tasks, while remaining competitive on open-ended generation.

Key Contribution

A new 1B-parameter multilingual LLM beats larger models on key tasks, suggesting that focused training can trump scale for instruction following in European languages.

Abstract

Large language models are powerful but often limited by high computational cost, privacy concerns, and English-centric training. Recent progress demonstrates that small, efficient models with around one billion parameters can deliver strong results and enable on-device use. This paper introduces MiniLingua, a multilingual open-source LLM of one billion parameters trained from scratch for 13 European languages, designed to balance coverage and instruction-following capabilities. Based on evaluation results, the instruction-tuned version of MiniLingua outperforms EuroLLM, a model with a similar training approach but a larger training budget, on summarization, classification and both open- and closed-book question answering. Moreover, it remains competitive with more advanced state-of-the-art models on open-ended generation tasks. We release model weights, tokenizer and source code used for data processing and model training.

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Open-Source Models & Weights

Citation Metrics

Citations0

Influential citations0

References49

Year2025

VenuearXiv.org

Related Papers

Finding related papers...

Search

MiniLingua: A Small Open-Source LLM for European Languages

Related Papers