Microsoft ResearchSNUMay 1, 2026arXiv:2605.00781

Map2World: Segment Map Conditioned Text to 3D World Generation

Jaeyoung Chung, Suyoung Lee, Jianfeng Xiang, Jiaolong Yang, Kyoung Mu Lee

AI Summary

The paper introduces Map2World, a framework for generating consistent and controllable 3D worlds conditioned on user-defined segment maps of arbitrary shapes and scales. A detail enhancer network is proposed to add fine-grained details while maintaining global coherence by incorporating structural information. By leveraging strong priors from asset generators, Map2World achieves robust generalization across diverse domains, outperforming existing methods in user-controllability, scale consistency, and content coherence.

Key Contribution

Forget grid layouts: Map2World lets you generate consistent 3D worlds from arbitrary segment maps, offering unprecedented control and scalability.

Abstract

3D world generation is essential for applications such as immersive content creation or autonomous driving simulation. Recent advances in 3D world generation have shown promising results; however, these methods are constrained by grid layouts and suffer from inconsistencies in object scale throughout the entire world. In this work, we introduce a novel framework, Map2World, that first enables 3D world generation conditioned on user-defined segment maps of arbitrary shapes and scales, ensuring global-scale consistency and flexibility across expansive environments. To further enhance the quality, we propose a detail enhancer network that generates fine details of the world. The detail enhancer enables the addition of fine-grained details without compromising overall scene coherence by incorporating global structure information. We design the entire pipeline to leverage strong priors from asset generators, achieving robust generalization across diverse domains, even under limited training data for scene generation. Extensive experiments demonstrate that our method significantly outperforms existing approaches in user-controllability, scale consistency, and content coherence, enabling users to generate 3D worlds under more complex conditions.

Computer Vision Multimodal Models World Models & Planning

Citation Metrics

Citations0

Influential citations0

References48

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Map2World: Segment Map Conditioned Text to 3D World Generation

Related Papers