Search papers, labs, and topics across Lattice.
This report details a reverse-engineering effort to understand the architecture of Black Forest Labs' FLUX text-to-image diffusion model, which lacks official documentation despite its state-of-the-art performance. The analysis extracts architectural details directly from the open-source code. The report aims to facilitate the use of FLUX as a foundation for future research by providing a technical understanding of its inner workings.
FLUX.1's architectural secrets are revealed, offering a blueprint for building next-generation text-to-image models.
FLUX.1 is a diffusion-based text-to-image generation model developed by Black Forest Labs, designed to achieve faithful text-image alignment while maintaining high image quality and diversity. FLUX is considered state-of-the-art in text-to-image generation, outperforming popular models such as Midjourney, DALL-E 3, Stable Diffusion 3 (SD3), and SDXL. Although publicly available as open source, the authors have not released official technical documentation detailing the model's architecture or training setup. This report summarizes an extensive reverse-engineering effort aimed at demystifying FLUX's architecture directly from its source code, to support its adoption as a backbone for future research and development. This document is an unofficial technical report and is not published or endorsed by the original developers or their affiliated institutions.