Search papers, labs, and topics across Lattice.
Prox-E is introduced, a training-free framework for fine-grained 3D shape editing that uses primitive-based geometric abstractions. It leverages a pretrained vision-language model (VLM) to edit a compact set of geometric primitives representing the 3D shape, enabling localized structural changes. Experiments show Prox-E achieves a better balance of identity preservation, shape quality, and instruction fidelity compared to 2D-based and training-based methods.
Achieve surgical 3D edits without training: Prox-E lets you reshape objects with language by manipulating a compact set of geometric primitives.
Text-based 2D image editing models have recently reached an impressive level of maturity, motivating a growing body of work that heavily depends on these models to drive 3D edits. While effective for appearance-based modifications, such 2D-centric 3D editing pipelines often struggle with fine-grained 3D editing, where localized structural changes must be applied while strictly preserving an object's overall identity. To address this limitation, we propose Prox-E, a training-free framework that enables fine-grained 3D control through an explicit, primitive-based geometric abstraction. Our framework first abstracts an input 3D shape into a compact set of geometric primitives. A pretrained vision-language model (VLM) then edits this abstraction to specify primitive-level changes. These structural edits are subsequently used to guide a 3D generative model, enabling fine-grained, localized modifications while preserving unchanged regions of the original shape. Through extensive experiments, we demonstrate that our method consistently balances identity preservation, shape quality, and instruction fidelity more effectively than various existing approaches, including 2D-based 3D editors and training-based methods.