Search papers, labs, and topics across Lattice.
The paper identifies and quantifies a previously undocumented "Order-to-Space Bias" (OTS) in image generation models, where the order of entities mentioned in a text prompt influences their spatial arrangement in the generated image. OTS-Bench, a new benchmark, is introduced to measure this bias by evaluating homogenization (similarity of images generated from prompts with different entity orders) and correctness (adherence to grounded cues). Experiments demonstrate that OTS is prevalent in text-to-image and image-to-image models, is data-driven, and emerges early in the generation process; targeted fine-tuning and early-stage intervention are shown to mitigate the bias.
Image generation models exhibit a surprising "Order-to-Space Bias," meaning the order you mention objects in a prompt can drastically alter their placement in the generated image, even overriding other visual cues.
We study a systematic bias in modern image generation models: the mention order of entities in text spuriously determines spatial layout and entity--role binding. We term this phenomenon Order-to-Space Bias (OTS) and show that it arises in both text-to-image and image-to-image generation, often overriding grounded cues and causing incorrect layouts or swapped assignments. To quantify OTS, we introduce OTS-Bench, which isolates order effects with paired prompts differing only in entity order and evaluates models along two dimensions: homogenization and correctness. Experiments show that Order-to-Space Bias (OTS) is widespread in modern image generation models, and provide evidence that it is primarily data-driven and manifests during the early stages of layout formation. Motivated by this insight, we show that both targeted fine-tuning and early-stage intervention strategies can substantially reduce OTS, while preserving generation quality.