Search papers, labs, and topics across Lattice.
2
0
4
4
Leaderboard-topping video models are still surprisingly brittle, failing on basic video reasoning tasks unless given the right textual cues.
Ditch autoregressive MLLMs: Omni-Diffusion proves that mask-based discrete diffusion models can unify multimodal understanding and generation across text, speech, and images with competitive performance.