Search papers, labs, and topics across Lattice.
2
0
5
Existing spatio-temporal video grounding methods choke on long videos, but this new autoregressive transformer efficiently handles them by processing frames sequentially and using memory banks with selection strategies.
Sparrow unlocks 2.8x faster inference for Video LLMs on long videos by cleverly offloading visual computation to the target model using text-anchored attention and semantic-rich intermediate states.