Search papers, labs, and topics across Lattice.
This paper introduces Luar, a Language Understanding Boundary-aware Reinforcement Learning framework that enables reasoning language models (RLMs) to selectively translate non-English inputs only when their direct understanding is insufficient. By training RLMs to evaluate the reliability of their comprehension, Luar significantly enhances performance on multilingual reasoning tasks, particularly for low-resource languages. The results demonstrate that Luar not only outperforms traditional training methods but also minimizes unnecessary translations, optimizing the reasoning process across diverse linguistic contexts.
RLMs can learn to invoke translation only when their direct understanding falters, leading to improved performance on multilingual reasoning tasks.
Reasoning language models (RLMs) achieve strong performance on complex reasoning tasks, but still exhibit substantial multilingual reasoning gaps, largely due to language-understanding failures in non-English inputs. English translation can mitigate these failures by expressing non-English inputs in a form that RLMs can more reliably interpret, yet translating every input is unnecessary when the model can reason reliably from the original query. To address this challenge, we propose Luar, a Language Understanding Boundary-aware Reinforcement Learning framework that trains RLMs to selectively invoke translation when direct understanding is unreliable. Luar trains the model to choose between solving the original input directly and reasoning over its English translation, encouraging translation only when translator-augmented reasoning is expected to substantially outperform direct reasoning. Across multilingual reasoning benchmarks, Luar outperforms standard GRPO and other training-based baselines, with particularly large gains on low-resource languages. Further analysis shows that Luar avoids unnecessary translation in cases where direct reasoning is sufficient, while extending its translator-call behavior to unseen low-resource languages. Together, our work suggests a selective approach to multilingual reasoning: RLMs can learn to invoke translation only when their direct understanding is unreliable. The project will be made publicly available at https://github.com/deokhk/LUAR