Search papers, labs, and topics across Lattice.
B [26] visual backbone. The action head is a conditional Flow Matching network implemented via an 8-layer Diffusion Transformer (DiT [16]) with a 1024 hidden dimension, trained to predict trajectories of horizon T=, Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications, Ministry of Education, China
1
0
3
0
A 5B model just crushed the image generation and editing performance of models 5-16x larger, thanks to smarter feature fusion and a novel RL training strategy.