Search papers, labs, and topics across Lattice.
This paper introduces TriMix, a novel logit fusion framework that enhances the adaptation of large language models to low-resource languages by dynamically integrating outputs from a small continually pretrained model, high-resource language instruction tuning, and large models. The approach addresses the limitations of Proxy Tuning, which often fails in low-resource contexts due to the overpowering influence of large models on weaker low-resource language competencies. Experimental results demonstrate that TriMix significantly outperforms existing methods across multiple model families and low-resource languages, highlighting the importance of leveraging specialized smaller models for effective language adaptation.
TriMix reveals that prioritizing small, specialized models can dramatically improve low-resource language adaptation, overturning the assumption that bigger models always lead the way.
Adapting large language models (LLMs) to low-resource languages (LRLs) is constrained by the scarcity of task data and computational resources. Although Proxy Tuning offers a logit-level strategy for introducing scaling effects, it often fails in LRL settings because the large model's weak LRL competence might overwhelm the knowledge of specialized smaller models. We thus propose TriMix, a test-time logit fusion framework that dynamically balances capabilities from three different sources: LRL competence from a continually pretrained small model, task competence from high-resource language instruction tuning, and the scaling benefits of large models. It is data- and compute-efficient, requiring no LRL task annotations, and only continual pretraining on a small model. Experiments across four model families and eight LRLs show that TriMix consistently outperforms single-model baselines and Proxy Tuning. Our analysis reveals that prioritizing the small LRL-specialized model's logits is crucial for success, challenging the prevalent large-model-dominant assumption.