Search papers, labs, and topics across Lattice.
The paper introduces an interpretation of the color representation within the latent space of the FLUX text-to-image generation model's VAE, revealing an organization mirroring Hue, Saturation, and Lightness (HSL). This "Latent Color Subspace" (LCS) is then leveraged to predict and control color in generated images without requiring any additional training. The method demonstrates that semantic image properties can be directly manipulated through closed-form latent-space operations.
Unlock precise, training-free color control in text-to-image models by directly manipulating the latent space's emergent Hue, Saturation, and Lightness structure.
Text-to-image generation models have advanced rapidly, yet achieving fine-grained control over generated images remains difficult, largely due to limited understanding of how semantic information is encoded. We develop an interpretation of the color representation in the Variational Autoencoder latent space of FLUX.1 [Dev], revealing a structure reflecting Hue, Saturation, and Lightness. We verify our Latent Color Subspace (LCS) interpretation by demonstrating that it can both predict and explicitly control color, introducing a fully training-free method in FLUX based solely on closed-form latent-space manipulation. Code is available at https://github.com/ExplainableML/LCS.