Search papers, labs, and topics across Lattice.
2
0
4
Current video generation benchmarks overlook crucial aspects of physical plausibility and temporal coherence, highlighting the need for holistic evaluation metrics like PhyScore.
Today's best multimodal LLMs still struggle to grasp fine-grained details and reason across multiple entities in images, even with access to external knowledge.