Search papers, labs, and topics across Lattice.
M Scores 3 ✗ ✓ ✗ Overall Quality KADID-10k [36] 10,125 Synthetic
3
0
6
Large-scale generative models struggle with low-level vision tasks, revealing critical performance gaps that conventional metrics fail to capture.
Current video generation benchmarks overlook crucial aspects of physical plausibility and temporal coherence, highlighting the need for holistic evaluation metrics like PhyScore.
LLMs with similar semantic skills show wildly different economic performance in simulated markets, revealing that reasoning about competition and resource allocation remains a major challenge.