Huiyu Duan

Current video generation benchmarks overlook crucial aspects of physical plausibility and temporal coherence, highlighting the need for holistic evaluation metrics like PhyScore.

Yiting Lu, Fengbin Guan, Zhibo Chen +26

Eval Frameworks & Benchmarks Multimodal Models World Models & Planning

Apr 7, 2026

Yushuo Zheng +8Apr 7, 2026·also ∗Corresponding authors, M Scores 3 ✗ ✓ ✗ Overall Quality KADID-10k [36] 10, SJTU

Market-Bench: Benchmarking Large Language Models on Economic and Trade Competition

LLMs with similar semantic skills show wildly different economic performance in simulated markets, revealing that reasoning about competition and resource allocation remains a major challenge.

Yushuo Zheng, Huiyu Duan, Huiyu Duan +6

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought Tool Use & Agents

Search

Huiyu Duan

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (3)