Search papers, labs, and topics across Lattice.
The paper investigates the impact of parallel data on cross-lingual alignment in multilingual pretraining. By training models with varying amounts of parallel data, the authors find that parallel data has a surprisingly limited effect on the final cross-lingual alignment achieved by the model. The primary benefits appear to be accelerating alignment early in training and reducing language-specific neurons, but comparable alignment emerges even without parallel data.
Forget massive parallel datasets: cross-lingual alignment in multilingual models emerges almost as effectively without them.
Shared multilingual representations are essential for cross-lingual tasks and knowledge transfer across languages. This study looks at the impact of parallel data, i.e. translated sentences, in pretraining as a signal to trigger representations that are aligned across languages. We train reference models with different proportions of parallel data and show that parallel data seem to have only a minimal effect on the cross-lingual alignment. Based on multiple evaluation methods, we find that the effect is limited to potentially accelerating the representation sharing in the early phases of pretraining, and to decreasing the amount of language-specific neurons in the model. Cross-lingual alignment seems to emerge on similar levels even without the explicit signal from parallel data.