Search papers, labs, and topics across Lattice.
2
0
4
Compressing KV caches with multi-granularity representations boosts long video QA accuracy without sacrificing memory or speed.
Even the best multimodal LLMs are surprisingly bad at understanding and remembering the "self" in egocentric video, lagging human performance by 40-50% on personalized question answering.