KyotoMar 17, 2026arXiv:2603.16622

Domain Mixture Design via Log-Likelihood Differences for Aligning Language Models with a Target Model

Ryo Kishino, Riku Shiomi, Hiroaki Yamagiwa, Momose Oyama, H. Shimodaira, Hidetoshi Shimodaira

AI Summary

This paper introduces a method for aligning a base language model with a target model by optimizing the domain mixture weights used during pretraining or continued pretraining. The approach determines domain weights by treating models as points in log-likelihood space and aligning the training update direction with the direction towards the target model. Experiments using NanoGPT show the method reduces KL divergence to the target model compared to uniform weighting, achieving meaningful alignment and improved downstream task performance, though knowledge distillation remains superior when feasible.

Key Contribution

Forget expensive distillation – aligning language models can be as simple as carefully choosing the right mix of pretraining data based on log-likelihood differences.

Abstract

Instead of directly distilling a language model, this study addresses the problem of aligning a base model with a target model in distribution by designing the domain mixture of training data for pretraining or continued pretraining as a fixed training recipe. We propose a method for determining domain weights by viewing models as points in log-likelihood space and aligning the training update direction with the direction toward the target model. Experiments with NanoGPT show that the proposed method consistently reduces the KL divergence to the target model compared with uniform weighting over the Pile. Although knowledge distillation remains more effective when available, the proposed method still achieves meaningful alignment, and downstream task performance also tends to become closer to that of the target model.

Data Curation & Synthetic Data Natural Language Processing Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References29

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Domain Mixture Design via Log-Likelihood Differences for Aligning Language Models with a Target Model

Related Papers