Mar 16, 2026arXiv:2603.14957

CyCLeGen: Cycle-Consistent Layout Prediction and Image Generation in Vision Foundation Models

Xiaojun Shan, Haoyu Shen, Yucheng Mao, Xiang Zhang, Abhay Anand, Bingnan Li, Haiyang Xu, Zhuowen Tu

AI Summary

CyCLeGen is introduced as a unified vision-language foundation model using a single autoregressive framework for both image understanding and generation. It enforces cycle-consistent learning via image-to-layout-to-image and layout-to-image-to-layout generation loops, enabling introspection and data efficiency. Experiments demonstrate that CyCLeGen achieves significant gains across various image understanding and generation benchmarks, showcasing the potential of unified vision-language models.

Key Contribution

Cycle-consistent learning unlocks self-improvement in vision-language models, enabling them to reason about their own generations and boosting performance across understanding and generation tasks.

Abstract

We present CyCLeGen, a unified vision-language foundation model capable of both image understanding and image generation within a single autoregressive framework. Unlike existing vision models that depend on separate modules for perception and synthesis, CyCLeGen adopts a fully integrated architecture that enforces cycle-consistent learning through image->layout->image and layout->image->layout generation loops. This unified formulation introduces two key advantages: introspection, enabling the model to reason about its own generations, and data efficiency, allowing self-improvement via synthetic supervision under a reinforcement learning objective guided by cycle consistency. Extensive experiments show that CyCLeGen achieves significant gains across diverse image understanding and generation benchmarks, highlighting the potential of unified vision-language foundation models.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

CyCLeGen: Cycle-Consistent Layout Prediction and Image Generation in Vision Foundation Models

Related Papers