MIT CSAILMar 10, 2026arXiv:2603.09971

TiPToP: A Modular Open-Vocabulary Planning System for Robotic Manipulation

William Shen, Nishanth Kumar, Sahit Chintalapudi, Jie Wang, Christopher Watson, Edward Hu, Jing Cao, Dinesh Jayaraman, Leslie Pack Kaelbling, Tomás Lozano-Pérez

AI Summary

TiPToP is introduced as a modular system combining pre-trained vision foundation models with a Task and Motion Planner (TAMP) to address multi-step manipulation tasks using RGB images and natural language. This system requires no robot data and can be deployed quickly, offering ease of use and adaptability. Evaluations across 28 tabletop manipulation tasks demonstrate that TiPToP matches or exceeds the performance of a VLA model fine-tuned on extensive robot-specific data.

Key Contribution

Zero-shot robotic manipulation is now within reach: TiPToP matches a 350-hour fine-tuned model without *any* robot data.

Abstract

We present TiPToP, an extensible modular system that combines pretrained vision foundation models with an existing Task and Motion Planner (TAMP) to solve multi-step manipulation tasks directly from input RGB images and natural-language instructions. Our system aims to be simple and easy-to-use: it can be installed and run on a standard DROID setup in under one hour and adapted to new embodiments with minimal effort. We evaluate TiPToP -- which requires zero robot data -- over 28 tabletop manipulation tasks in simulation and the real world and find it matches or outperforms $π_{0.5}\text{-DROID}$, a vision-language-action (VLA) model fine-tuned on 350 hours of embodiment-specific demonstrations. TiPToP's modular architecture enables us to analyze the system's failure modes at the component level. We analyze results from an evaluation of 173 trials and identify directions for improvement. We release TiPToP open-source to further research on modular manipulation systems and tighter integration between learning and planning. Project website and code: https://tiptop-robot.github.io

Multimodal Models Robotics & Embodied AI Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

TiPToP: A Modular Open-Vocabulary Planning System for Robotic Manipulation

Related Papers