Search papers, labs, and topics across Lattice.
∗ They contributed equally to this work. Yue Zhao, Yujia Gong, Ruigang Liang, Shenchen Zhu, and Kai Chen are with the Institute of Information Engineering, Chinese Academy of Sciences. (e-mail: zhaoyue@iie.ac.cn; gongyujia@iie.ac.cn; liangruigang@iie.ac.cn; zhushenchen@iie.ac.cn; chenkai@iie.ac.cn) Xuejing Yuan is with the Beijing University of Posts and Telecommunications (e-mail: yuanxuejing@bupt.edu.cn). Wangjun Zhang is with the Guangzhou University (e-mail: wangjunzhang@e.gzhu.edu.cn)
2
0
4
2
Forget textual rules and coarse embeddings: a multimodal reward model that directly compares rendered visuals unlocks significant gains in vision-to-code RL.
Diffusion models can now reason their way through complex spatial tasks with near-perfect accuracy, thanks to a new framework that unlocks chain-of-thought reasoning within the latent space.