Search papers, labs, and topics across Lattice.
Zhongguancun Laboratory
1
0
3
MLLMs can be significantly improved by directly supervising visual tokens with corresponding text, without needing architectural changes or extra computation.