Mar 12, 2026arXiv:2603.11881

Bielik-Minitron-7B: Compressing Large Language Models via Structured Pruning and Knowledge Distillation for the Polish Language

Remigiusz Kinas, Paweł Kiszczak, Pawel Kiszczak, Sergio P. Perez, Krzysztof Ociepa, Lukasz Flis, Łukasz Flis, Krzysztof Wr'obel, Krzysztof Wróbel, Adrian Gwoździej, Adrian Gwo'zdziej

AI Summary

Bielik-Minitron-7B, a compressed version of Bielik-11B-v3.0, was created using a two-stage process of structured hybrid pruning via NVIDIA Model Optimizer followed by logit-based knowledge distillation using NVIDIA NeMo. This reduced the parameter count by 33.4% while retaining 90% of the original model's performance and achieving a 50% inference speedup. The model was further aligned using SFT, DPO-P, and GRPO.

Key Contribution

Achieve a 50% inference speedup on a large language model for European languages by compressing it to 7.35B parameters, while retaining 90% of the original 11B parameter model's performance.

Abstract

This report details the creation of Bielik-Minitron-7B, a compressed 7.35B parameter version of the Bielik-11B-v3.0 model, specifically optimized for European languages. By leveraging a two-stage compression methodology inspired by the NVIDIA Minitron approach, we combined structured hybrid pruning and knowledge distillation to reduce the model's parameter count by 33.4%, from 11.04B to 7.35B. We utilized the NVIDIA Model Optimizer for structural pruning and the NVIDIA NeMo Framework for logit-based distillation for quality recovery. Following distillation, the model underwent a rigorous alignment pipeline consisting of Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO-P), and Reinforcement Learning (GRPO). Our final model successfully recovered approximately 90% of the baseline model's performance while providing up to 50% inference speedup. This approach demonstrates an efficient pathway to create language models for less-represented languages, preserving the original model quality while reducing inference deployment costs.

Inference & Quantization Natural Language Processing Open-Source Models & Weights

Citation Metrics

Citations0

Influential citations0

References28

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Bielik-Minitron-7B: Compressing Large Language Models via Structured Pruning and Knowledge Distillation for the Polish Language

Related Papers