Search papers, labs, and topics across Lattice.
Qwen Team, Alibaba Group 4 University of California San Diego 5 Zhejiang University 6 Shanghai Jiao Tong University Equal Contribution.Corresponding author. yang.yujiu@sz.tsinghua.edu.cn
Tsinghua AI2
0
5
Forget scaling reasoning – this work shows that scaling visual perception using code-grounded data is the real key to unlocking MLLMs' STEM abilities.
Multimodal models are often blind at birth: a new "Visual Attention Score" reveals they struggle to focus on visual inputs during cold-start, but a simple attention-guided fix can boost performance by 7%.