Search papers, labs, and topics across Lattice.
The paper introduces HiMAP-Travel, a hierarchical multi-agent framework to address the challenges of long-horizon planning with hard constraints in LLMs, where sequential agents often deviate from global constraints as context grows. HiMAP-Travel decomposes the planning task into strategic coordination by a Coordinator agent and parallel day-level execution by Day Executor agents, using a transactional monitor, a bargaining protocol, and a single policy trained with GRPO. Experiments on TravelPlanner and FlexTravelBench demonstrate that HiMAP-Travel, powered by Qwen3-8B, significantly outperforms sequential baselines and other multi-agent approaches in terms of Final Pass Rate and latency.
LLMs can plan complex trips with hard constraints like budget and diversity 2.5x faster by decomposing the task into a hierarchy of coordinating and executing agents.
Sequential LLM agents fail on long-horizon planning with hard constraints like budgets and diversity requirements. As planning progresses and context grows, these agents drift from global constraints. We propose HiMAP-Travel, a hierarchical multi-agent framework that splits planning into strategic coordination and parallel day-level execution. A Coordinator allocates resources across days, while Day Executors plan independently in parallel. Three key mechanisms enable this: a transactional monitor enforcing budget and uniqueness constraints across parallel agents, a bargaining protocol allowing agents to reject infeasible sub-goals and trigger re-planning, and a single policy trained with GRPO that powers all agents through role conditioning. On TravelPlanner, HiMAP-Travel with Qwen3-8B achieves 52.78% validation and 52.65% test Final Pass Rate (FPR). In a controlled comparison with identical model, training, and tools, it outperforms the sequential DeepTravel baseline by +8.67~pp. It also surpasses ATLAS by +17.65~pp and MTP by +10.0~pp. On FlexTravelBench multi-turn scenarios, it achieves 44.34% (2-turn) and 37.42% (3-turn) FPR while reducing latency 2.5x through parallelization.