Search papers, labs, and topics across Lattice.
This paper introduces a unified pushing policy based on flow matching, enhanced with a visual prompting mechanism to generate reactive pushing actions. The visual prompt allows for high-level planners to guide the policy, enabling its use across diverse planning scenarios. Experiments demonstrate the policy's superior performance compared to existing baselines and its effectiveness as a low-level primitive within a VLM-guided table-cleaning framework.
Forget hand-engineered pushing primitives: this unified policy uses visual prompts to achieve versatile and efficient object rearrangement.
As one of the simplest non-prehensile manipulation skills, pushing has been widely studied as an effective means to rearrange objects. Existing approaches, however, typically rely on multi-step push plans composed of pre-defined pushing primitives with limited application scopes, which restrict their efficiency and versatility across different scenarios. In this work, we propose a unified pushing policy that incorporates a lightweight prompting mechanism into a flow matching policy to guide the generation of reactive, multimodal pushing actions. The visual prompt can be specified by a high-level planner, enabling the reuse of the pushing policy across a wide range of planning problems. Experimental results demonstrate that the proposed unified pushing policy not only outperforms existing baselines but also effectively serves as a low-level primitive within a VLM-guided planning framework to solve table-cleaning tasks efficiently.