Search papers, labs, and topics across Lattice.
1
3
LLaVA-1.5 can achieve similar or better vision-language performance with only 25% of the original vision tokens by spatially fusing redundant tokens.