Search papers, labs, and topics across Lattice.
3
0
6
5
OmniJigsaw reveals a "bi-modal shortcut phenomenon" in joint audio-visual integration, demonstrating that naive fusion can be surprisingly ineffective and highlighting the importance of carefully designed cross-modal training strategies.
By jointly training a keyframe sampler with an MLLM, MSJoE achieves state-of-the-art accuracy in long-form video understanding while significantly reducing computational cost.
Unleashing powerful reasoning in OLLMs doesn't require expensive training data or compute – just clever guidance from existing Large Reasoning Models.