Search papers, labs, and topics across Lattice.
4
0
5
5
Doc-V* demonstrates that an agentic approach to multi-page document VQA, using active navigation and structured memory, can significantly outperform retrieval-augmented generation, especially in out-of-domain scenarios.
VideoLLMs can now watch and think *simultaneously*, achieving 15x faster response times and improved accuracy on video understanding tasks.
By jointly training a keyframe sampler with an MLLM, MSJoE achieves state-of-the-art accuracy in long-form video understanding while significantly reducing computational cost.
Unleashing powerful reasoning in OLLMs doesn't require expensive training data or compute – just clever guidance from existing Large Reasoning Models.