Apr 9, 2026arXiv:2604.08475

LAMP: Lift Image-Editing as General 3D Priors for Open-world Manipulation

Jingjing Wang, Zhengdong Hong, Chong Bao, Chong Bao, Yuke Zhu, Yuke Zhu, Junhan Sun, Jun Sun, Guofeng Zhang

AI Summary

The paper introduces LAMP, a method that leverages image editing operations to extract 3D transformation priors for robotic manipulation. By lifting 2D spatial cues from image edits into continuous 3D transformations, LAMP generates geometry-aware representations of inter-object relationships. Experiments demonstrate that LAMP achieves strong zero-shot generalization in open-world manipulation tasks by providing precise 3D transformation guidance.

Key Contribution

Image editing, surprisingly, holds the key to robots that can nimbly manipulate objects in the real world.

Abstract

Human-like generalization in open-world remains a fundamental challenge for robotic manipulation. Existing learning-based methods, including reinforcement learning, imitation learning, and vision-language-action-models (VLAs), often struggle with novel tasks and unseen environments. Another promising direction is to explore generalizable representations that capture fine-grained spatial and geometric relations for open-world manipulation. While large-language-model (LLMs) and vision-language-model (VLMs) provide strong semantic reasoning based on language or annotated 2D representations, their limited 3D awareness restricts their applicability to fine-grained manipulation. To address this, we propose LAMP, which lifts image-editing as 3D priors to extract inter-object 3D transformations as continuous, geometry-aware representations. Our key insight is that image-editing inherently encodes rich 2D spatial cues, and lifting these implicit cues into 3D transformations provides fine-grained and accurate guidance for open-world manipulation. Extensive experiments demonstrate that \codename delivers precise 3D transformations and achieves strong zero-shot generalization in open-world manipulation. Project page: https://zju3dv.github.io/LAMP/.

Computer Vision Multimodal Models Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References95

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

LAMP: Lift Image-Editing as General 3D Priors for Open-world Manipulation

Related Papers