HuggingFaceUCSDUIUCUMichWorld LabsJun 11, 2026arXiv:2606.13652

World Tracing: Generative Pixel-Aligned Geometry Beyond the Visible

Hao Zhang, Hao Zhang, Mohamed El Banani, Mohamed El Banani, Jen-Hao Cheng, Jen-Hao Cheng, Paul Zhang, Paul Zhang, Yiwen Hua, Yi Hua, B. Mildenhall, Ben Mildenhall, Christoph Lassner, Christoph Lassner, Narendra Ahuja, Narendra Ahuja, Gengshan Yang, Gengshan Yang

AI Summary

This paper introduces World Tracing, a novel generative pixel-aligned geometry representation that predicts 3D points aligned with observed pixels while also completing geometry beyond the visible surface. By employing a world-tracing diffusion transformer (WT-DiT) that treats multiple geometry layers as separate denoising tokens, the method achieves a balance between visible-surface reconstruction and occluded-geometry generation. The results demonstrate superior performance in both visible-surface reconstruction and complete geometry generation across various benchmarks, enabling new applications such as text-driven 3D scene editing and geometry-conditioned novel-view video synthesis.

Key Contribution

World Tracing not only reconstructs visible surfaces but also reveals occluded geometry, unlocking advanced capabilities in 3D scene manipulation and synthesis.

Abstract

Image-to-3D methods often trade off faithfulness and completeness: depth estimators are anchored to input pixels but stop at the visible surface, while image-to-3D models generate complete shapes that are often misaligned with the input. We introduce World Tracing, a generative pixel-aligned geometry representation that predicts 3D points aligned with observed pixels while completing geometry beyond the visible surface. For each input pixel, World Tracing predicts an ordered stack of camera-space 3D points, where the first layer represents the visible surface and subsequent layers represent front-to-back intersections with occluded surfaces. We instantiate this representation with a world-tracing diffusion transformer, WT-DiT, which treats multiple geometry layers as separate denoising tokens coupled through factorized and global attention. WT-DiT is trained with pixel-space flow matching and a mixed noise schedule that balances visible-surface reconstruction with occluded-geometry generation. World Tracing achieves strong performance on visible-surface reconstruction and complete geometry generation across object, scene, and dynamic benchmarks, outperforming both depth predictors and image-to-3D generators. It also preserves 2D-to-3D correspondence, enabling text-driven 3D scene editing, geometry-conditioned novel-view video synthesis, and training-free integration with textured-mesh generators.

Computer Vision Multimodal Models World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

World Tracing: Generative Pixel-Aligned Geometry Beyond the Visible

Related Papers