Mar 3, 2026arXiv:2603.02658

OmniFashion: Towards Generalist Fashion Intelligence via Multi-Task Vision-Language Learning

AI Summary

The authors address the fragmented supervision and incomplete annotations in fashion intelligence by constructing FashionX, a million-scale dataset with exhaustive fashion item annotations and hierarchical attribute organization. They then propose OmniFashion, a unified vision-language framework that bridges diverse fashion tasks under a dialogue paradigm, enabling multi-task reasoning and interactive dialogue. Experiments demonstrate strong task-level accuracy and cross-task generalization, suggesting a scalable path toward universal fashion intelligence.

Key Contribution

A new unified vision-language framework, OmniFashion, tackles diverse fashion tasks by reformulating them as dialogue, achieving strong accuracy and cross-task generalization.

Abstract

Fashion intelligence spans multiple tasks, i.e., retrieval, recommendation, recognition, and dialogue, yet remains hindered by fragmented supervision and incomplete fashion annotations. These limitations jointly restrict the formation of consistent visual-semantic structures, preventing recent vision-language models (VLMs) from serving as a generalist fashion brain that unifies understanding and reasoning across tasks. Therefore, we construct FashionX, a million-scale dataset that exhaustively annotates visible fashion items within an outfit and organizes attributes from global to part-level. Built upon this foundation, we propose OmniFashion, a unified vision-language framework that bridges diverse fashion tasks under a unified fashion dialogue paradigm, enabling both multi-task reasoning and interactive dialogue. Experiments on multi-subtasks and retrieval benchmarks show that OmniFashion achieves strong task-level accuracy and cross-task generalization, highlighting its offering of a scalable path toward universal, dialogue-oriented fashion intelligence.

Computer Vision Multimodal Models Recommendation & Information Retrieval

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

OmniFashion: Towards Generalist Fashion Intelligence via Multi-Task Vision-Language Learning

Related Papers