Feeling AINTUOxfordShanghai AI LabMay 27, 2026arXiv:2605.27852

ClothTransformer: Unified Latent-Space Transformers for Scalable Cloth Simulation

Yu Zhang, Yidi Shao, Wenqi Ouyang, Yushi Lan, Zhexin Liang, Chengrui Wu, Xudong Xu, Xingang Pan

AI Summary

ClothTransformer reformulates cloth simulation as autoregressive sequence modeling in a learned latent space, enabling a unified model to handle diverse scenarios like body-driven garments, robotic manipulation, and free-fall collisions. By compressing arbitrary-resolution meshes into fixed-size latent tokens, the method achieves temporal dynamics computation independent of mesh resolution. Experiments show ClothTransformer achieves 4-9x lower error than prior state-of-the-art methods and robust collision handling, enabled by a new high-fidelity dataset and differentiable Continuous Collision Detection (CCD) module.

Key Contribution

ClothTransformer achieves state-of-the-art cloth simulation by learning a unified latent space, allowing a single model to handle diverse scenarios and mesh resolutions with significantly improved accuracy and collision handling.

Abstract

Unified and scalable Transformers have recently achieved remarkable success in modeling diverse phenomena traditionally associated with computer graphics, such as 3D visual effects, rendering processes, and motion in videos. In this work, we take a step further by investigating whether modern Transformer techniques can tackle the challenging task of cloth simulation. To this end, we present ClothTransformer, a framework that reformulates cloth simulation as autoregressive sequence modeling in a learned latent space. Existing neural cloth simulators are largely specialized to single scenarios, intrinsically coupled to the mesh discretization, and lack robust collision handling. Our approach addresses these limitations through three contributions: (1) a unified Transformer architecture that handles diverse scenarios -- body-driven garments, robotic manipulation, and free-fall collisions -- under a single model and achieves approximately $4$--$9{\times}$ lower error than prior state-of-the-art methods across all scenarios; (2) a scalable latent-space formulation that compresses arbitrary-resolution meshes into a fixed-size set of latent tokens, making temporal dynamics computation independent of mesh resolution; and (3) a diverse-scenario high-fidelity penetration-free dataset of ${\sim}$493.4k frames spanning all three settings, which enables a differentiable Continuous Collision Detection (CCD) module to suppress penetration artifacts.

Architecture Design (Transformers, SSMs, MoE)World Models & Planning

Citation Metrics

Citations0

Influential citations0

References40

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

ClothTransformer: Unified Latent-Space Transformers for Scalable Cloth Simulation

Related Papers