HiDream.ai IncHITJun 9, 2026arXiv:2606.11148

MOFA-VTON: More Fashion Possibilities with Fine-Grained Adaptations in Virtual Try-On

Xiaoyu Han, Chenyang Wang, Jing Wang, Shunyuan Zheng, Quanling Meng, Shengping Zhang

AI Summary

This paper introduces MOFA-VTON, a novel virtual try-on method that enhances clothing adaptation through user-drawn sketches, allowing for diverse dressing options tailored to individual preferences. By implementing a mask construction strategy that transforms sketches into a dual-region mask and employing layout adjustment blocks with a cross-attention mechanism, the method refines the spatial arrangement of clothing on the human body. Extensive experiments on the VITON-HD and DressCode datasets show that MOFA-VTON significantly outperforms existing state-of-the-art approaches, offering greater flexibility in virtual try-on scenarios.

Key Contribution

Users can now sketch their clothing preferences, leading to a virtual try-on experience that adapts clothing styles with unprecedented flexibility.

Abstract

Virtual try-on aims to fit an in-shop clothing image onto a specific human body. An optimal virtual try-on method should provide diverse and flexible dressing options, accurately reflecting the varied wearing styles encountered in real-life scenarios, tailored to individual preferences and fashion aspirations. However, current methods predominantly perform a direct replacement of the original clothing with the target clothing, following the same dressing pattern. This limited control over clothing adaptation may result in fixed and monotonous try-on outputs. To delve into More Fashion Possibilities with Fine-Grained Adaptations in Virtual Try-On, we propose a novel virtual try-on method, termed MOFA-VTON, which allows adjustment for clothing adaptations in try-on results through simple sketches by users. Specifically, we first design a mask construction strategy that transforms user-drawn curve sketches into a dual-region mask, replacing the traditional clothing-agnostic mask and providing fine-grained layout guidance for the subsequent generation process. Further, we propose layout adjustment blocks that utilize the cross-attention mechanism to independently learn layout correspondences for upper and lower regions of the human body, refining the spatial arrangement of the two regions. With these implementations, our method enables flexible and fine-grained adaptations of target clothing, overcoming the constraints of a fixed layout. Extensive experiments on VITON-HD and DressCode datasets demonstrate that our proposed MOFA-VTON outperforms previous state-of-the-art methods and provides more fashion possibilities for virtual try-on.

Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

MOFA-VTON: More Fashion Possibilities with Fine-Grained Adaptations in Virtual Try-On

Related Papers