Search papers, labs, and topics across Lattice.
Shanghai Jiao Tong University, This work is partially supported by the Joint Funds of the National Natural Science Foundation of China (Grant No.U21B2020), Special Fund for the Action Plan of Shanghai Jiao Tong University’s “Technological Revitalization of Mongolia” under Subcontract No.2025XYJG0001-01-06, National Natural Science Foundation of China (62406188) and Natural Science Foundation of Shanghai (24ZR1440300). (Corresponding author: Zhuosheng Zhang, Gongshen Liu)Haodong Zhao, Jinming Hu, Zhuosheng Zhang, Haojin Zhu and Gongshen Liu are with School of Computer Science, Shanghai Jiao Tong University, Shanghai, China. Gongshen Liu is also with Inner Mongolia Research Institute, Shanghai Jiao Tong University (e-mail: {zhaohaodong, hujinming, zhangzs, zhu-hj, lgshen}@sjtu.edu.cn).Yijie Bai and Wei Du are with Ant Group, China (e-mail: {baiyijie.byj, xiwei.dw}@antgroup.com).Tian Dong is with The University of Hong Kong, China (e-mail: tiandong@hku.hk).Yanjiao Chen is with the College of Electrical Engineering, Zhejiang University, Hangzhou, China (e-mail: chenyanjiao@zju.edu.cn)
1
0
3
2
Ditch the slow, iterative zooming during MLLM inference: Region-to-Image Distillation lets you bake those agentic zooming benefits directly into a single forward pass.