Search papers, labs, and topics across Lattice.
The paper introduces ExPosST, a framework for applying decoder-only LLMs to simultaneous machine translation (SimulMT) by explicitly allocating fixed positional slots for source tokens to address positional mismatch issues. ExPosST enables efficient decoding with KV caching across different positional encoding methods and introduces a policy-consistent fine-tuning strategy to align training with inference. Experiments across multiple language pairs demonstrate that ExPosST effectively supports simultaneous translation under diverse policies, resolving the dilemma between decoding efficiency and positional consistency.
Achieve efficient and positionally consistent simultaneous machine translation with LLMs, regardless of the positional encoding method, using a surprisingly simple explicit position allocation strategy.
Large language models (LLMs) have recently demonstrated promising performance in simultaneous machine translation (SimulMT). However, applying decoder-only LLMs to SimulMT introduces a positional mismatch, which leads to a dilemma between decoding efficiency and positional consistency. Existing approaches often rely on specific positional encodings or carefully designed prompting schemes, and thus fail to simultaneously achieve inference efficiency, positional consistency, and broad model compatibility. In this work, we propose ExPosST, a general framework that resolves this dilemma through explicit position allocation. ExPosST reserves fixed positional slots for incoming source tokens, enabling efficient decoding with KV cache across different positional encoding methods. To further bridge the gap between fine-tuning and inference, we introduce a policy-consistent fine-tuning strategy that aligns training with inference-time decoding behavior. Experiments across multiple language pairs demonstrate that ExPosST effectively supports simultaneous translation under diverse policies.