Search papers, labs, and topics across Lattice.
School of Computer Science and Technology, Beijing Institute of Technology, Zhongguancun Academy
1
0
3
MLLMs can be significantly improved by directly supervising visual tokens with corresponding text, without needing architectural changes or extra computation.