Jiaqi Wang

B [26] visual backbone. The action head is a conditional Flow Matching network implemented via an 8-layer Diffusion Transformer (DiT [16]) with a 1024 hidden dimension, trained to predict trajectories of horizon T=, Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications, Ministry of Education, China

Papers on Lattice

Total citations

Topics

h-index

Research focus

Architecture Design (Transformers, SSMs, MoE) (1)Computer Vision (1)Multimodal Models (1)

Frequent co-authors

Ruihang Li (1)Feng Han (1)Wei Song (1)Siyuan Wang (1)

Papers (1)

Feb 12, 2026

Feb 12, 2026·also Li Auto, Qingdao University Hospital, School of Computing and Artificial Intelligence, Shanghai Innovation +3

DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing

A 5B model just crushed the image generation and editing performance of models 5-16x larger, thanks to smarter feature fusion and a novel RL training strategy.

Ruihang Li, Feng Han, Wei Song +14

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

Search

Jiaqi Wang

Research focus

Frequent co-authors

Papers (1)