Search papers, labs, and topics across Lattice.
This paper introduces ShapleyLaw, a game-theoretic approach to multilingual scaling laws that models cross-lingual transfer effects by treating each language as a player in a cooperative game. The Shapley value is used to quantify each language's contribution to the reduction in test loss, thereby capturing cross-lingual transfer. Experiments demonstrate that ShapleyLaw more accurately predicts model performance and optimizes language mixture ratios compared to existing methods that ignore cross-lingual transfer.
Optimizing multilingual training? Shapley values reveal the hidden cross-lingual transfer effects that current scaling laws miss, leading to better language mixture ratios.
In multilingual pretraining, the test loss of a pretrained model is heavily influenced by the proportion of each language in the pretraining data, namely the \textit{language mixture ratios}. Multilingual scaling laws can predict the test loss under different language mixture ratios and can therefore be used to estimate the optimal ratios. However, the current approaches to multilingual scaling laws do not measure the \textit{cross-lingual transfer} effect, resulting in suboptimal mixture ratios. In this paper, we consider multilingual pretraining as a cooperative game in which each language acts as a player that jointly contributes to pretraining, gaining the resulting reduction in test loss as the payoff. Consequently, from the perspective of cooperative game theory, we quantify the cross-lingual transfer from each language by its contribution in the game, and propose a game-theoretic multilingual scaling law called \textit{ShapleyLaw}. Our experiments show that ShapleyLaw outperforms baseline methods in model performance prediction and language mixture optimization.