Automation and Information TechnologiesDepartment of Automated Systems for DataDepartment of Data Analysis and ProgrammingDmukhtasibovich -Doctor of Physical and MathematicalInstitute of ManagementKazan Federal UniversityKazan National Research TechnologicalNusratullovich -Doctor of TechnicalProfessorUNIVERSITY INSTITUTE OF COMPUTATIONALMay 6, 2026arXiv:2605.04948

Adapting Large Language Models to a Low-Resource Agglutinative Language: A Comparative Study of LoRA and QLoRA for Bashkir

Mullosharaf K. Arabov, Svetlana S. Khaybullina

AI Summary

This paper benchmarks LoRA and QLoRA for adapting large language models to Bashkir, a low-resource agglutinative language, using a 71k document corpus. They find that QLoRA on 7B-scale models like Mistral-7B and Phi-2 achieves perplexity comparable to full fine-tuning of smaller models like GPT-2, but with 40x fewer trainable parameters. However, the study also reveals that PEFT performance is highly sensitive to the base model and tokenizer, with some architectures exhibiting significant quality degradation.

Key Contribution

Forget full fine-tuning: QLoRA on 7B models can match the perplexity of fully fine-tuned smaller models for low-resource languages, while slashing the parameter count by 40x.

Abstract

This paper presents a comparative study of parameter-efficient fine-tuning (PEFT) methods, including LoRA and QLoRA, applied to the task of adapting large language models to the Bashkir language, a low-resource agglutinative language of the Turkic family. Experimental evaluation is conducted on a Bashkir text corpus of 71k documents (46.9M tokens) using models of various architectures: DistilGPT2, GPT-2 (base, medium), Phi-2, Qwen2.5-7B, DeepSeek-7B, and Mistral-7B. To improve the reliability of results, each configuration was trained with three different random seeds. The lowest perplexity on the test set was obtained for GPT-2 medium with full fine-tuning (3.34). Meanwhile, QLoRA applied to Mistral-7B (3.79) and Phi-2 (3.81) achieved comparable quality with over 40 times fewer trainable parameters. However, we also observed cases of significant quality degradation when using PEFT for certain architectures (e.g., DeepSeek-7B with rank 8, perplexity = 129.55), indicating that the outcome depends critically on the choice of the base model and its tokenizer. Additionally, a qualitative analysis of generated texts based on Bashkir prompts revealed that models with the best perplexity do not necessarily produce the most coherent outputs: QLoRA-tuned models generated monolingual Bashkir continuations, whereas the fully fine-tuned model with the lowest perplexity frequently switched to English. The results suggest that QLoRA on 7B-scale models offers an effective compromise between quality and computational cost for Bashkir. To ensure reproducibility, open data, code, and trained adapters will be released upon acceptance.

Inference & Quantization Natural Language Processing Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Adapting Large Language Models to a Low-Resource Agglutinative Language: A Comparative Study of LoRA and QLoRA for Bashkir

Related Papers