Wangmeng Zuo

Coordinating embodied multi-agent systems doesn't require end-to-end training; instead, offload planning to a VLM in simulation and transfer back to the real world for execution.

Li Kang, Yutao Fan, Rui Li +11

Robotics & Embodied AI Tool Use & Agents World Models & Planning

Mar 27, 2026

Mar 27, 2026·also HIT

CREval: An Automated Interpretable Evaluation for Creative Image Manipulation under Complex Instructions

Current image editing models, even closed-source ones, still fall short on complex and creative instruction-based tasks, as revealed by a new interpretable QA-based evaluation framework.

Chong Wang, Zihan Chen, Yuxiang Wei +5

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

Mar 26, 2026

ArtHOI: Taming Foundation Models for Monocular 4D Reconstruction of Hand-Articulated-Object Interactions

Foundation models can be tamed to reconstruct realistic 4D interactions between hands and articulated objects from a single RGB video, even without pre-scanning or multi-view data.

Zikai Wang, Zhilu Zhang, Yiqing Wang +2

Computer Vision Multimodal Models Robotics & Embodied AI

Mar 3, 2026

CGL: Advancing Continual GUI Learning via Reinforcement Fine-Tuning

RL's inherent resilience to catastrophic forgetting can be harnessed to improve continual learning in GUI agents, outperforming SFT alone.

Zhenquan Yao, Zitong Huang, Yihan Zeng +4

Multimodal Models RLHF & Preference Learning Tool Use & Agents

Mar 2, 2026

Yecong Wan +3Mar 2, 2026

InterCoG: Towards Spatially Precise Image Editing with Interleaved Chain-of-Grounding Reasoning

Achieve spatially precise image edits in complex scenes by explicitly reasoning about object positions in text *before* visual grounding.

Yecong Wan, Chunwei Wang, Mingwen Shao +1

Computer Vision Multimodal Models Reasoning & Chain-of-Thought

Feb 25, 2026

Weidong Qiao +1Feb 25, 2026

Lie Flow: Video Dynamic Fields Modeling and Predicting with Lie Algebra as Geometric Physics Principle

Achieve more realistic and coherent 4D scene representations by modeling motion within the SE(3) Lie group, outperforming NeRF-based methods.

Weidong Qiao, Wangmeng Zuo

Computer Vision World Models & Planning

Jun 24, 2025

Jun 24, 2025·also RUC, USTC

Position: Intelligent Science Laboratory Requires the Integration of Cognitive and Embodied AI

Imagine AI scientists that not only reason but also autonomously conduct experiments in the real world – that's the promise of Intelligent Science Laboratories.

Sha Zhang, Suorong Yang, Tong Xie +18

Robotics & Embodied AI Scientific Discovery & Drug Design Tool Use & Agents

Search

Wangmeng Zuo

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (8)