Search papers, labs, and topics across Lattice.
Nanjing University
3
0
6
4
Leaderboard-topping video models are still surprisingly brittle, failing on basic video reasoning tasks unless given the right textual cues.
Forget static, single-turn personalization – PersonaVLM unlocks long-term, evolving user alignment in MLLMs, even surpassing GPT-4o.
Ditch autoregressive MLLMs: Omni-Diffusion proves that mask-based discrete diffusion models can unify multimodal understanding and generation across text, speech, and images with competitive performance.