Search papers, labs, and topics across Lattice.
1
0
3
Pruning vision tokens across both the ViT and LLM can yield a 62% efficiency boost in video VLMs with minimal performance loss, and without complex text conditioning.