Search papers, labs, and topics across Lattice.
2
0
5
1
Multimodal models can now achieve state-of-the-art performance in real-world tasks like document understanding and audio-video comprehension with significantly reduced inference latency thanks to novel token-reduction techniques.
Current multimodal LLMs choke on long-form video understanding, either forgetting details or getting lost in the timeline, but a new agentic architecture with dynamic memory management offers a promising fix.