Feb 22, 2026arXiv:2602.19304

Safe and Interpretable Multimodal Path Planning for Multi-Agent Cooperation

Haojun Shi, Suyu Ye, Katherine M. Guerrerio, Jianzhi Shen, Yifan Yin, Daniel Khashabi, Chien-Ming Huang, Tianmin Shu

AI Summary

The paper introduces CaPE (Code as Path Editor), a novel method for safe and interpretable multimodal path planning in multi-agent cooperative scenarios. CaPE leverages a vision-language model (VLM) to synthesize path editing programs based on language communication from other agents, which are then verified by a model-based planner to ensure safety. Experiments across simulated and real-world tasks, including autonomous driving and joint carrying, demonstrate CaPE's ability to enhance plan alignment with language while maintaining safety and interpretability.

Key Contribution

Robots can now understand and safely adapt their paths based on human language instructions, thanks to a new system that synthesizes path-editing programs verified by a model-based planner.

Abstract

Successful cooperation among decentralized agents requires each agent to quickly adapt its plan to the behavior of other agents. In scenarios where agents cannot confidently predict one another's intentions and plans, language communication can be crucial for ensuring safety. In this work, we focus on path-level cooperation in which agents must adapt their paths to one another in order to avoid collisions or perform physical collaboration such as joint carrying. In particular, we propose a safe and interpretable multimodal path planning method, CaPE (Code as Path Editor), which generates and updates path plans for an agent based on the environment and language communication from other agents. CaPE leverages a vision-language model (VLM) to synthesize a path editing program verified by a model-based planner, grounding communication to path plan updates in a safe and interpretable way. We evaluate our approach in diverse simulated and real-world scenarios, including multi-robot and human-robot cooperation in autonomous driving, household, and joint carrying tasks. Experimental results demonstrate that CaPE can be integrated into different robotic systems as a plug-and-play module, greatly enhancing a robot's ability to align its plan to language communication from other robots or humans. We also show that the combination of the VLM-based path editing program synthesis and model-based planning safety enables robots to achieve open-ended cooperation while maintaining safety and interpretability.

Multimodal Models Robotics & Embodied AI World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Safe and Interpretable Multimodal Path Planning for Multi-Agent Cooperation

Related Papers