Search papers, labs, and topics across Lattice.
The paper introduces LabelPigeon, a framework for jointly performing machine translation and label projection by representing annotations as XML tags. They demonstrate that this joint approach not only improves label projection accuracy but also enhances translation quality across a wide range of languages and annotation complexities. Through direct evaluation and downstream task performance, LabelPigeon achieves substantial gains in cross-lingual transfer compared to traditional methods.
Injecting XML annotation tags directly into machine translation surprisingly boosts both translation quality and cross-lingual transfer performance, achieving up to +39.9 F1 on NER.
Label projection is an effective technique for cross-lingual transfer, extending span-annotated datasets from a high-resource language to low-resource ones. Most approaches perform label projection as a separate step after machine translation, and prior work that combines the two reports degraded translation quality. We re-evaluate this claim with LabelPigeon, a novel framework that jointly performs translation and label projection via XML tags. We design a direct evaluation scheme for label projection, and find that LabelPigeon outperforms baselines and actively improves translation quality in 11 languages. We further assess translation quality across 203 languages and varying annotation complexity, finding consistent improvement attributed to additional fine-tuning. Finally, across 27 languages and three downstream tasks, we report substantial gains in cross-lingual transfer over comparable work, up to +39.9 F1 on NER. Overall, our results demonstrate that XML-tagged label projection provides effective and efficient label transfer without compromising translation quality.