Search papers, labs, and topics across Lattice.
Shanghai Jiao Tong University
4
0
8
Reward models that adapt to fine-grained, task-specific criteria can significantly improve text-to-image generation by better aligning with user preferences.
Current video generation benchmarks overlook crucial aspects of physical plausibility and temporal coherence, highlighting the need for holistic evaluation metrics like PhyScore.
Current video-to-audio models are surprisingly bad at generating speech and singing, exposing critical gaps in multimodal understanding.
LLMs with similar semantic skills show wildly different economic performance in simulated markets, revealing that reasoning about competition and resource allocation remains a major challenge.