Search papers, labs, and topics across Lattice.
1
0
3
Existing spatio-temporal video grounding methods choke on long videos, but this new autoregressive transformer efficiently handles them by processing frames sequentially and using memory banks with selection strategies.