CornellD data while Ours only to the textTAUTechnionApr 26, 2026arXiv:2604.23774

Prox-E: Fine-Grained 3D Shape Editing via Primitive-Based Abstractions

Etai Sella, Hao Phung, Nitay Amiel, O. Litany, Or Patashnik, Hadar Averbuch-Elor

AI Summary

Prox-E is introduced, a training-free framework for fine-grained 3D shape editing that uses primitive-based geometric abstractions. It leverages a pretrained vision-language model (VLM) to edit a compact set of geometric primitives representing the 3D shape, enabling localized structural changes. Experiments show Prox-E achieves a better balance of identity preservation, shape quality, and instruction fidelity compared to 2D-based and training-based methods.

Key Contribution

Achieve surgical 3D edits without training: Prox-E lets you reshape objects with language by manipulating a compact set of geometric primitives.

Abstract

Text-based 2D image editing models have recently reached an impressive level of maturity, motivating a growing body of work that heavily depends on these models to drive 3D edits. While effective for appearance-based modifications, such 2D-centric 3D editing pipelines often struggle with fine-grained 3D editing, where localized structural changes must be applied while strictly preserving an object's overall identity. To address this limitation, we propose Prox-E, a training-free framework that enables fine-grained 3D control through an explicit, primitive-based geometric abstraction. Our framework first abstracts an input 3D shape into a compact set of geometric primitives. A pretrained vision-language model (VLM) then edits this abstraction to specify primitive-level changes. These structural edits are subsequently used to guide a 3D generative model, enabling fine-grained, localized modifications while preserving unchanged regions of the original shape. Through extensive experiments, we demonstrate that our method consistently balances identity preservation, shape quality, and instruction fidelity more effectively than various existing approaches, including 2D-based 3D editors and training-based methods.

Computer Vision Multimodal Models Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References64

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Prox-E: Fine-Grained 3D Shape Editing via Primitive-Based Abstractions

Related Papers