Search papers, labs, and topics across Lattice.
This paper investigates cross-lingual knowledge transfer in LLMs, finding that script differences, rather than language, are the primary barrier. Through regression analysis on ECLeKTic and MultiLoKo datasets, the authors show that script matching is the strongest predictor of successful knowledge transfer, even after controlling for model capability and question difficulty. They then demonstrate that targeted SFT, designed to improve reasoning about transliteration ambiguities, reduces the cross-script transfer gap, suggesting that post-training can enhance cross-lingual parametric knowledge transfer.
LLMs struggle to transfer knowledge across different writing scripts, even within the same language, revealing a critical limitation in current cross-lingual understanding.
In this work, we analyze shortcomings in cross-lingual knowledge transfer in large, modern reasoning LLMs. We demonstrate that the perceived gap in knowledge transfer is primarily a script barrier. First, we conduct an observational data analysis on the performance of thinking models on two datasets with local knowledge from around the world, ECLeKTic and MultiLoKo. Our regression analysis shows that script match - not language or family - is the primary predictor of knowledge transfer failure once model capability and question difficulty are accounted for. We further this finding by providing the LLMs with the key entities of the questions in their source language and find that this disproportionately improves cross-script questions. We then posit that these LLMs could be reasoning better at test-time. To evaluate this, we develop a synthetic generation pipeline to design SFT samples to encourage the model to better reason about transliteration ambiguities when trying to fetch parametric knowledge at inference-time. We show that teaching two models to reason better reduces the cross-script transfer gap. As a result, we conclude that there is potential to improve cross-lingual parametric knowledge transfer during post-training.