National Information Processing InstituteMar 12, 2026arXiv:2603.12191

Long-Context Encoder Models for Polish Language Understanding

Sławomir Dadas, Slawomir Dadas, Rafal Po'swiata, Rafał Poświata, Marek Kozłowski, Malgorzata Grkebowiec, Małgorzata Grębowiec, Michał Perełkiewicz, Paweł Klimiuk, P. Klimiuk, Przemysław Boruta, Przemyslaw Boruta

AI Summary

A long-context encoder model for Polish language understanding was developed using a two-stage training procedure involving positional embedding adaptation and full parameter continuous pre-training. Knowledge distillation was used to create compressed model variants. Evaluated on 25 tasks, including KLEJ and FinBench, the model achieves state-of-the-art average performance among Polish and multilingual models, especially on long-context tasks.

Key Contribution

Polish language understanding gets a long-context boost: a new encoder model handles sequences up to 8192 tokens, outperforming existing models on long documents while remaining competitive on shorter texts.

Abstract

While decoder-only Large Language Models (LLMs) have recently dominated the NLP landscape, encoder-only architectures remain a cost-effective and parameter-efficient standard for discriminative tasks. However, classic encoders like BERT are limited by a short context window, which is insufficient for processing long documents. In this paper, we address this limitation for the Polish by introducing a high-quality Polish model capable of processing sequences of up to 8192 tokens. The model was developed by employing a two-stage training procedure that involves positional embedding adaptation and full parameter continuous pre-training. Furthermore, we propose compressed model variants trained via knowledge distillation. The models were evaluated on 25 tasks, including the KLEJ benchmark, a newly introduced financial task suite (FinBench), and other classification and regression tasks, specifically those requiring long-document understanding. The results demonstrate that our model achieves the best average performance among Polish and multilingual models, significantly outperforming competitive solutions in long-context tasks while maintaining comparable quality on short texts.

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Open-Source Models & Weights

Citation Metrics

Citations0

Influential citations0

References36

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Long-Context Encoder Models for Polish Language Understanding

Related Papers