Search papers, labs, and topics across Lattice.
This paper introduces FlowComposer, a novel framework for compositional zero-shot learning (CZSL) that uses flow matching to explicitly construct compositions in the embedding space. FlowComposer learns two primitive flows to transport visual features toward attribute and object text embeddings, and then fuses their velocity fields into a composition flow using a learnable Composer. To address residual feature entanglement, they also propose a leakage-guided augmentation scheme. Experiments on three CZSL benchmarks demonstrate that FlowComposer consistently improves performance when integrated into various baselines.
FlowComposer tackles the implicit composition construction and feature entanglement limitations of existing compositional zero-shot learning methods by explicitly fusing attribute and object flows in the embedding space.
Compositional zero-shot learning (CZSL) aims to recognize unseen attribute-object compositions by recombining primitives learned from seen pairs. Recent CZSL methods built on vision-language models (VLMs) typically adopt parameter-efficient fine-tuning (PEFT). They apply visual disentanglers for decomposition and manipulate token-level prompts or prefixes to encode compositions. However, such PEFT-based designs suffer from two fundamental limitations: (1) Implicit Composition Construction, where composition is realized only via token concatenation or branch-wise prompt tuning rather than an explicit operation in the embedding space; (2) Remained Feature Entanglement, where imperfect disentanglement leaves attribute, object, and composition features mutually contaminated. Together, these issues limit the generalization ability of current CZSL models. In this paper, we are the first to systematically study flow matching for CZSL and introduce FlowComposer, a model-agnostic framework that learns two primitive flows to transport visual features toward attribute and object text embeddings, and a learnable Composer that explicitly fuses their velocity fields into a composition flow. To exploit the inevitable residual entanglement, we further devise a leakage-guided augmentation scheme that reuses leaked features as auxiliary signals. We thoroughly evaluate FlowComposer on three public CZSL benchmarks by integrating it as a plug-and-play component into various baselines, consistently achieving significant improvements.