Mar 19, 2026arXiv:2603.19053

SwiftTailor: Efficient 3D Garment Generation with Geometry Image Representation

Phuc Pham, Uy Dieu Tran, Binh-Son Hua, P. Nguyen, Phong Nguyen

AI Summary

SwiftTailor is introduced, a two-stage framework for efficient 3D garment generation that uses a geometry image representation to unify sewing-pattern reasoning and mesh synthesis. PatternMaker, a vision-language model, predicts sewing patterns, which are then converted by GarmentSewer, a dense prediction transformer, into a Garment Geometry Image encoding the 3D surface. By using an inverse mapping process with remeshing and dynamic stitching, SwiftTailor achieves state-of-the-art accuracy and visual fidelity with significantly reduced inference time compared to existing methods.

Key Contribution

Forget waiting a minute for garment generation: SwiftTailor slashes inference times while boosting accuracy by representing 3D garments as geometry images.

Abstract

Realistic and efficient 3D garment generation remains a longstanding challenge in computer vision and digital fashion. Existing methods typically rely on large vision- language models to produce serialized representations of 2D sewing patterns, which are then transformed into simulation-ready 3D meshes using garment modeling framework such as GarmentCode. Although these approaches yield high-quality results, they often suffer from slow inference times, ranging from 30 seconds to a minute. In this work, we introduce SwiftTailor, a novel two-stage framework that unifies sewing-pattern reasoning and geometry-based mesh synthesis through a compact geometry image representation. SwiftTailor comprises two lightweight modules: PatternMaker, an efficient vision-language model that predicts sewing patterns from diverse input modalities, and GarmentSewer, an efficient dense prediction transformer that converts these patterns into a novel Garment Geometry Image, encoding the 3D surface of all garment panels in a unified UV space. The final 3D mesh is reconstructed through an efficient inverse mapping process that incorporates remeshing and dynamic stitching algorithms to directly assemble the garment, thereby amortizing the cost of physical simulation. Extensive experiments on the Multimodal GarmentCodeData demonstrate that SwiftTailor achieves state-of-the-art accuracy and visual fidelity while significantly reducing inference time. This work offers a scalable, interpretable, and high-performance solution for next-generation 3D garment generation.

Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References48

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SwiftTailor: Efficient 3D Garment Generation with Geometry Image Representation

Related Papers