Microsoft ResearchCUHKOxfordSJTUJun 9, 2026arXiv:2606.10478

3D-CoS: A New 3D Reconstruction Paradigm Based on VLM Code Synthesis

Yuhao Wang, Puyi Wang, Linjie Li, Zhengyuan Yang, Kevin Qinghong Lin, Yu Cheng

AI Summary

This paper introduces 3D Code Synthesis (3D-CoS), a novel paradigm for 3D reconstruction that utilizes executable Blender code as a medium for constructing 3D assets, enhancing programmability and interpretability compared to traditional representations like NeRF and point clouds. The authors systematically evaluate various vision-language models (VLMs) in code-based reconstruction, employing structured workflows that leverage blueprint planning and Retrieval-Augmented Generation to optimize the use of Blender's API. Key findings reveal that this code-based approach significantly improves edit fidelity and preserves unedited regions better than conventional point-cloud methods, showcasing the potential of code synthesis for editable 3D modeling.

Key Contribution

Code-based 3D reconstruction achieves superior edit fidelity and locality, outperforming traditional point-cloud methods in preserving unedited regions.

Abstract

Most recent 3D reconstruction and editing systems operate on implicit and explicit representations such as NeRF, point clouds, or meshes. While these representations enable high-fidelity rendering, they are fundamentally low-level and hard to control programmatically. In contrast, we propose and systematically evaluate a new 3D reconstruction paradigm, 3D Code Synthesis (3D-CoS), where 3D assets are constructed as executable Blender code, a programmatic and interpretable medium. To assess how well current VLMs can use code to represent 3D objects, we evaluate representative open-source and closed-source VLMs in code-based reconstruction under a unified protocol. We further introduce a suite of structured code-synthesis workflows, including blueprint-based planning, Retrieval-Augmented Generation (RAG) over Blender API documentation, few-shot geometric demonstrations, and a component-level Agent workflow for part-wise code generation. To demonstrate the unique advantages of this representation, we further evaluate localized text-driven modifications and compare our code-based edits with a point-cloud-based 3D editing baseline. Our study shows that code as a 3D representation offers strong controllability and locality, yielding stronger edit fidelity and better preservation of unedited regions in our targeted editing evaluation. Our work also analyzes the potential of this paradigm, delineates the current capability frontier of VLMs for programmatic 3D modeling, and highlights code synthesis as a promising direction for editable 3D reconstruction.

Code Generation & Program Synthesis Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

3D-CoS: A New 3D Reconstruction Paradigm Based on VLM Code Synthesis

Related Papers