Search papers, labs, and topics across Lattice.
School of Computer Science and Technology, Beijing Institute of Technology
1
0
3
MLLMs can be significantly improved by directly supervising visual tokens with corresponding text, without needing architectural changes or extra computation.