Search papers, labs, and topics across Lattice.
This paper explores a reinforcement learning (RL) approach to enhance unseen language translation by leveraging rich linguistic context, addressing the limitations of prior methods that overfit to specific languages. The authors demonstrate that their RL-trained models outperform traditional in-context learning and supervised fine-tuning in translating extremely low-resource languages, achieving better results despite using a lightweight surface-level translation metric (chrF) as the reward. Their findings suggest that outcome-based RL can effectively facilitate contextual language learning, expanding its applicability beyond conventional reasoning tasks.
RL-trained models can significantly improve unseen language translation by effectively leveraging contextual linguistic knowledge, outperforming traditional methods.
Prior work has shown that large language models (LLMs) can translate unseen or low-resource languages by undergoing continued training or even by encoding a grammar book in their context. However, both methods typically overfit specific languages, with limited zero-shot transfer at test time. To translate extremely low-resource languages at scale, we argue that LLMs must acquire the meta-skill of utilizing in-context linguistic knowledge rather than memorizing specific languages. In this paper, we propose a reinforcement learning (RL) approach to unseen language translation given rich linguistic context, using a surface-level translation metric (chrF) as the reward. Empirically, despite the lightweight reward, our RL-trained models effectively extract and apply relevant linguistic information from the provided context, leading to better translations on completely unseen languages than in-context learning or supervised fine-tuning. Our analyses suggest that outcome-based RL can extend beyond conventional reasoning tasks like math and coding to serve as a recipe for language learning from context.