Search papers, labs, and topics across Lattice.
This paper introduces a direct contact-tolerant (DCT) motion planner that integrates vision-language models (VLMs) for robot navigation in cluttered environments. The approach uses a VLM point cloud partitioner (VPP) to reason about contact tolerance in image space and generate contact-aware point clouds. A VPP-guided navigation (VGN) module then formulates contact-tolerant motion planning as a perception-to-control optimization problem solved by a DNN. Experiments in simulation and on a real robot demonstrate DCT's robustness and efficiency compared to baselines.
Robots can now navigate cluttered spaces more efficiently by directly "seeing" and tolerating contact with movable objects, thanks to a vision-language model that reasons about contact in image space.
Navigation in cluttered environments often requires robots to tolerate contact with movable or deformable objects to maintain efficiency. Existing contact-tolerant motion planning (CTMP) methods rely on indirect spatial representations (e.g., prebuilt map, obstacle set), resulting in inaccuracies and a lack of adaptiveness to environmental uncertainties. To address this issue, we propose a direct contact-tolerant (DCT) planner, which integrates vision-language models (VLMs) into direct point perception and navigation, including two key components. The first one is VLM point cloud partitioner (VPP), which performs contact-tolerance reasoning in image space using VLM, caches inference masks, propagates them across frames using odometry, and projects them onto the current scan to generate a contact-aware point cloud. The second innovation is VPP guided navigation (VGN), which formulates CTMP as a perception-to-control optimization problem under direct contact-aware point cloud constraints, which is further solved by a specialized deep neural network (DNN). We implement DCT in Isaac Sim and a real car-like robot, demonstrating that DCT achieves robust and efficient navigation in cluttered environments with movable obstacles, outperforming representative baselines across diverse metrics. The code is available at: https://github.com/ChrisLeeUM/DCT.