MilaCohereFeb 16, 2026arXiv:2602.14763

Unlocking Reasoning Capability on Machine Translation in Large Language Models

Sara Rajaee, Sebastian Vincent, Alexandre Berard, Marzieh Fadaee, Kelly Marchisio

AI Summary

The paper investigates the impact of reasoning-oriented large language models (RLMs) on machine translation (MT) using the WMT24++ benchmark, finding that generic reasoning traces degrade translation quality. They identify that MT reasoning traces are overly linear and lack crucial revision and exploration. To address this, they propose a structured reasoning framework with multi-step drafting, refinement, and revision, creating a synthetic dataset to train a large reasoning model.

Key Contribution

Generic reasoning hurts machine translation, but a new structured reasoning approach—with iterative drafting, refinement, and revision—unlocks significant gains.

Abstract

Reasoning-oriented large language models (RLMs) achieve strong gains on tasks such as mathematics and coding by generating explicit intermediate reasoning. However, their impact on machine translation (MT) remains underexplored. We systematically evaluate several open- and closed-weights RLMs on the WMT24++ benchmark and find that enabling explicit reasoning consistently degrades translation quality across languages and models. Analysis reveals that MT reasoning traces are highly linear, lacking revision, self-correction and exploration of alternative translations, which limits their usefulness. Furthermore, injecting higher-quality reasoning traces from stronger models does not reliably improve weaker models' performance. To address this mismatch, we propose a structured reasoning framework tailored to translation, based on multi-step drafting, adequacy refinement, fluency improvement, and selective iterative revision. We curate a synthetic dataset of dynamic structured reasoning traces and post-train a large reasoning model on this data. Experiments show significant improvements over standard translation fine-tuning and injected generic reasoning baselines. Our findings demonstrate that reasoning must be task-structured to benefit MT.

Eval Frameworks & Benchmarks Natural Language Processing Reasoning & Chain-of-Thought

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Unlocking Reasoning Capability on Machine Translation in Large Language Models

Related Papers