Haodong Yan

Ditch slow, multi-step video generation: S-VAM distills the structured generative priors of multi-step denoising into a single forward pass for real-time robot action prediction.

Haodong Yan, Zhide Zhong, Jiaguan Zhu +11

Computer Vision Robotics & Embodied AI World Models & Planning

Feb 26, 2026

Feb 26, 2026·also Galbot, TU Munich, Xidian

Rethinking the Practicality of Vision-language-action Model: A Comprehensive Benchmark and An Improved Baseline

A practical VLA model, LLaVA-VLA, achieves strong generalization and versatility on a new benchmark, CEBench, while running on consumer-grade GPUs, eliminating the need for costly pre-training.

Wenxuan Song, Jiayi Chen, Xiaoquan Sun +11

Eval Frameworks & Benchmarks Multimodal Models Robotics & Embodied AI

Search

Haodong Yan

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (3)