Zhengyuan Yang

Papers on Lattice

Total citations

Topics

h-index

Research focus

Multimodal Models (4)Tool Use & Agents (2)Computer Vision (2)Robotics & Embodied AI (1)World Models & Planning (1)

Frequent co-authors

Lijuan Wang (4)Shiqi Chen (2)Li Fei-Fei (2)Zezi Zeng (2)

Papers (5)

May 28, 2026

Microsoft ResearchMay 28, 2026·also Stanford HAI, Northwestern, Oxford, Stan- ford University

Planning with the Views via Scene Self-Exploration

VLMs can learn to actively reason and plan in 3D environments by distilling view graphs from self-exploration trajectories, enabling them to surpass even larger models like GPT-4 Pro and Gemini 1.5 Pro on interactive view planning.

Kangrui Wang, Kangrui Wang, Linjie Li +14

Multimodal Models Robotics & Embodied AI World Models & Planning

Apr 16, 2026

Microsoft ResearchApr 16, 2026

MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation

Hierarchical planning and self-reflection can finally wrangle AIGC tools into producing coherent, visually consistent webpages.

Zezi Zeng, Yifan Yang, Yuqing Yang +8

Code Generation & Program Synthesis Multimodal Models Tool Use & Agents

Apr 8, 2026

NUSApr 8, 2026·also Central South University, Tencent AI

FlowInOne:Unifying Multimodal Generation as Image-in, Image-out Flow Matching

Forget text-centric pipelines: FlowInOne achieves SOTA multimodal generation by unifying text, layouts, and instructions into a single visual flow, outperforming both open-source and commercial systems.

Junchao Yi, Weixian Lei, Qi Su +3

Computer Vision Multimodal Models

Apr 7, 2026

AI2Apr 7, 2026·also Stanford HAI, Oxford

RAGEN-2: Reasoning Collapse in Agentic RL

LLM agents can appear to reason well (high entropy) while completely ignoring the input, and mutual information is a far better metric for catching this failure.

Chi Gui, Chi Gui, Xing Jin +9

Reasoning & Chain-of-Thought RLHF & Preference Learning Tool Use & Agents

Mar 26, 2026

Yan Li +15Mar 26, 2026·also Microsoft Research

BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation

Current image generation models fall far short of the mark when it comes to the structured and multi-constraint demands of real-world commercial design, as revealed by a new systematic benchmark.

Yan Li, Zezi Zeng, Ziwei Zhou +13

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

Search

Zhengyuan Yang

Research focus

Frequent co-authors

Papers (5)