Tsinghua AID image pre-training for decoder is the sameMay 28, 2026arXiv:2605.30065

Boosting Zero-Shot 3D Style Transfer with 2D Pre-trained Priors

Xin Dong, Yunzhi Teng, Yunzhi Teng, Wenfeng Deng, Wenfeng Deng, Yansong Tang, Yansong Tang

AI Summary

This paper addresses the challenge of data scarcity in zero-shot 3D style transfer by incorporating a decoder pre-trained on large-scale 2D image datasets. They propose Data-Sufficient StyleGaussian (DS-StyleGaussian), which combines feature Gaussian splatting with deferred stylization to leverage the prior knowledge encoded in the 2D decoder. Results show that DS-StyleGaussian outperforms existing zero-shot 3D style transfer methods in visual quality, demonstrating the effectiveness of 2D pre-training for 3D tasks.

Key Contribution

Achieve high-quality 3D style transfer from a single scene by injecting a 2D-pretrained decoder, sidestepping the usual data scarcity bottleneck.

Abstract

In this work, we focus on zero-shot 3D style transfer that can generate multi-view consistent stylized views of the 3D scene given an arbitrary style image. We primarily tackle the issue of data scarcity in 3D style transfer, which arises when each model is trained on only a single scene, thereby limiting the number of available content images. This scarcity significantly hampers stylization performance, as model optimization relies on a sufficient number of content-style image pairs to provide supervisory signals. Our core idea is to integrate a decoder pre-trained on large-scale 2D image datasets into the 3D style transfer pipeline, thereby leveraging the prior knowledge encoded in the decoder from learning over numerous content-style image pairs. Our method combines feature Gaussian splatting and deferred stylization, enabling high-quality stylization with the data-sufficient decoder network while ensuring view consistency by unifying view-dependent operations into a view-invariant process. Experiments demonstrate that our Data-Sufficient StyleGaussian (DS-StyleGaussian) model outperforms existing zero-shot 3D style transfer methods in terms of visual quality across various datasets. This work also suggests that 2D pre-training can serve as a strong enhancement for 3D tasks, bridging the data gap between 2D and 3D.

Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References30

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Boosting Zero-Shot 3D Style Transfer with 2D Pre-trained Priors

Related Papers