Search papers, labs, and topics across Lattice.
The paper introduces ShipTraj-R1, an LLM-based framework for ship trajectory prediction that reformulates the problem as text-to-text generation with dynamic prompts incorporating information about conflicting ships. A rule-based reward mechanism incentivizes reasoning format and prediction accuracy, and the model is fine-tuned using Group Relative Policy Optimization (GRPO). Experiments on real-world maritime datasets demonstrate that ShipTraj-R1, using Qwen3 as a backbone, achieves state-of-the-art performance compared to deep learning and other LLM-based baselines.
LLMs can now predict ship trajectories with state-of-the-art accuracy, thanks to a novel framework that combines dynamic prompting, rule-based rewards, and group relative policy optimization.
Recent advancements in reinforcement fine-tuning have significantly improved the reasoning ability of large language models (LLMs). In particular, methods such as group relative policy optimization (GRPO) have demonstrated strong capabilities across various fields. However, applying LLMs to ship trajectory prediction remains largely unexplored. In this paper, we propose ShipTraj-R1, a novel LLM-based framework that reformulates ship trajectory prediction as a text-to-text generation problem. (1) We design a dynamic prompt containing trajectory information about conflicting ships to guide the model to achieve adaptive chain-of-thought (CoT) reasoning. (2) We introduce a comprehensive rule-based reward mechanism to incentivize the reasoning format and prediction accuracy of the model. (3) Our ShipTraj-R1 is reinforced through the GRPO mechanism guided by domain-specific prompts and rewards, and utilizes the Qwen3 as the model backbone. Extensive experimental results on two complex and real-world maritime datasets show that the proposed ShipTraj-R1 achieves the least error compared with state-of-the-art deep learning and LLM-based baselines.