Search papers, labs, and topics across Lattice.
AeroPlace-Flow is introduced, a training-free framework for language-grounded aerial object placement using visual foresight and object flow. It synthesizes a goal image from language instructions and RGB-D observations, grounds it in 3D space, and infers a collision-aware object flow for placement. The method achieves a 75% success rate in real-world experiments, demonstrating reliable language-conditioned placement without predefined poses or task-specific training.
Imagine telling a drone "put the box on the table" and it just *does it* – AeroPlace-Flow makes this a reality with no training required.
Precise object placement remains underexplored in aerial manipulation, where most systems rely on predefined target coordinates and focus primarily on grasping and control. Specifying exact placement poses, however, is cumbersome in real-world settings, where users naturally communicate goals through language. In this work, we present AeroPlace-Flow, a training-free framework for language-grounded aerial object placement that unifies visual foresight with explicit 3D geometric reasoning and object flow. Given RGB-D observations of the object and the placement scene, along with a natural language instruction, AeroPlace-Flow first synthesizes a task-complete goal image using image editing models. The imagined configuration is then grounded into metric 3D space through depth alignment and object-centric reasoning, enabling the inference of a collision-aware object flow that transports the grasped object to a language and contact-consistent placement configuration. The resulting motion is executed via standard trajectory tracking for an aerial manipulator. AeroPlace-Flow produces executable placement targets without requiring predefined poses or task-specific training. We validate our approach through extensive simulation and real-world experiments, demonstrating reliable language-conditioned placement across diverse aerial scenarios with an average success rate of 75% on hardware.