Search papers, labs, and topics across Lattice.
FASH-iCNN, a multimodal CNN trained on a large dataset of Vogue runway images, is introduced to inspect the encoded cultural logic within fashion AI systems. The system predicts the fashion house, era, and color tradition of a garment from an image. Analysis of visual channels reveals that texture and luminance are the primary carriers of editorial identity, as removing texture significantly impacts house identity accuracy compared to removing color.
Texture, not color, is the secret sauce behind fashion house identity, revealed by probing a multimodal CNN trained on decades of Vogue runway images.
Fashion AI systems routinely encode the aesthetic logic of specific houses, editors, and historical moments without disclosing it. We present FASH-iCNN, a multimodal system trained on 87,547 Vogue runway images across 15 fashion houses spanning 1991-2024 that makes this cultural logic inspectable. Given a photograph of a garment, the system recovers which house produced it, which era it belongs to, and which color tradition it reflects. A clothing-only model identifies the fashion house at 78.2% top-1 across 14 houses, the decade at 88.6% top-1, and the specific year at 58.3% top-1 across 34 years with a mean error of just 2.2 years. Probing which visual channels carry this signal reveals a sharp dissociation: removing color costs only 10.6pp of house identity accuracy, while removing texture costs 37.6pp, establishing texture and luminance as the primary carriers of editorial identity. FASH-iCNN treats editorial culture as the signal rather than background noise, identifying which houses, eras, and color traditions shaped each output so that users can see not just what the system predicts but which houses, editors, and historical moments are encoded in that prediction.