Apr 23, 2026arXiv:2604.21592

Sculpt4D: Generating 4D Shapes via Sparse-Attention Diffusion Transformers

Minghao Yin, Wenbo Hu, Jiale Xu, Ying Shan, Kai Han

AI Summary

Sculpt4D introduces a 4D generative framework built upon a pretrained 3D Diffusion Transformer (Hunyuan3D 2.1) to address the challenges of temporally coherent 4D shape generation. It employs a novel Block Sparse Attention mechanism, anchored to the initial frame with a time-decaying sparse mask, to efficiently model spatiotemporal dependencies. This approach achieves state-of-the-art results in 4D synthesis while reducing computational costs by 56% compared to full attention.

Key Contribution

Forget generating static shapes – Sculpt4D now lets you efficiently sculpt dynamic 4D objects with state-of-the-art temporal coherence.

Abstract

Recent breakthroughs in 3D generative modeling have yielded remarkable progress in static shape synthesis, yet high-fidelity dynamic 4D generation remains elusive, hindered by temporal artifacts and prohibitive computational demand. We present Sculpt4D, a native 4D generative framework that seamlessly integrates efficient temporal modeling into a pretrained 3D Diffusion Transformer (Hunyuan3D 2.1), thereby mitigating the scarcity of 4D training data. At its core lies a Block Sparse Attention mechanism that preserves object identity by anchoring to the initial frame while capturing rich motion dynamics via a time-decaying sparse mask. This design faithfully models complex spatiotemporal dependencies with high fidelity, while sidestepping the quadratic overhead of full attention and reducing network total computation by 56%. Consequently, Sculpt4D establishes a new state-of-the-art in temporally coherent 4D synthesis and charts a path toward efficient and scalable 4D generation.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References68

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Sculpt4D: Generating 4D Shapes via Sparse-Attention Diffusion Transformers

Related Papers