Search papers, labs, and topics across Lattice.
The paper introduces Chain-of-Context Learning (CCL), a novel reinforcement learning framework for solving multi-task Vehicle Routing Problems (VRPs) that addresses the limitations of existing methods by explicitly modeling constraint and node dynamics. CCL uses a Relevance-Guided Context Reformulation (RGCR) module to prioritize salient constraints and a Trajectory-Shared Node Re-embedding (TSNR) module to aggregate node features across trajectories. Experiments on 48 VRP variants demonstrate that CCL outperforms state-of-the-art baselines, especially on in-distribution tasks and many out-of-distribution tasks with unseen constraints.
Achieve state-of-the-art results on multi-task vehicle routing problems by dynamically adapting to evolving constraints, even when those constraints are unseen during training.
Multi-task Vehicle Routing Problems (VRPs) aim to minimize routing costs while satisfying diverse constraints. Existing solvers typically adopt a unified reinforcement learning (RL) framework to learn generalizable patterns across tasks. However, they often overlook the constraint and node dynamics during the decision process, making the model fail to accurately react to the current context. To address this limitation, we propose Chain-of-Context Learning (CCL), a novel framework that progressively captures the evolving context to guide fine-grained node adaptation. Specifically, CCL constructs step-wise contextual information via a Relevance-Guided Context Reformulation (RGCR) module, which adaptively prioritizes salient constraints. This context then guides node updates through a Trajectory-Shared Node Re-embedding (TSNR) module, which aggregates shared node features from all trajectories' contexts and uses them to update inputs for the next step. By modeling evolving preferences of the RL agent, CCL captures step-by-step dependencies in sequential decision-making. We evaluate CCL on 48 diverse VRP variants, including 16 in-distribution and 32 out-of-distribution (with unseen constraints) tasks. Experimental results show that CCL performs favorably against the state-of-the-art baselines, achieving the best performance on all in-distribution tasks and the majority of out-of-distribution tasks.