Huaisong Zhang

Today's best AI agents can only complete 33% of common online tasks like booking appointments or filling out job applications, revealing a significant gap between current capabilities and real-world utility.

Yuxuan Zhang, Yubo Wang, Yipeng Zhu +19

Eval Frameworks & Benchmarks Natural Language Processing Tool Use & Agents

Apr 6, 2026

Watch Before You Answer: Learning from Visually Grounded Post-Training

Current video understanding benchmarks and post-training datasets are riddled with linguistic biases, meaning VLMs might be acing tests without actually "watching" the video.

Eunjeong Hwang, Huaisong Zhang, Penghui Du +7

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

Search

Huaisong Zhang

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (3)