Zhaoxiang Zhang

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Multimodal Models (4)World Models & Planning (3)Eval Frameworks & Benchmarks (3)Tool Use & Agents (3)

Frequent co-authors

Lue Fan (2)Zheng Ju (2)Hongxin Li (2)Hongxin Li (2)

Papers (6)

Jun 10, 2026

Zefu Lin +65d ago

World Pilot: Steering Vision-Language-Action Models with World-Action Priors

World Pilot achieves an unprecedented 84.7% success rate in zero-shot manipulation tasks by integrating anticipatory scene and motion priors into VLA models.

Zefu Lin, Rongxu Cui, Junjia Xu +4

Multimodal Models Robotics & Embodied AI World Models & Planning

Jun 1, 2026

2w ago·also PKU

TVIR: Building Deep Research Agents Towards Text--Visual Interleaved Report Generation

TVIR-Agent reveals that integrating visual elements into report generation can dramatically improve the quality and reliability of analytical outputs.

Xinkai Ma, Zhiqi Bai, Dingling Zhang +22

Eval Frameworks & Benchmarks Multimodal Models

May 25, 2026

3w ago·also CUHK, PKU

MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research

Training agents in MobileGym transfers surprisingly well to real-world mobile devices, retaining over 95% of the simulation-side performance gains.

Dingbang Wu, Rui Hao, Haiyang Wang +7

Eval Frameworks & Benchmarks Tool Use & Agents World Models & Planning

Apr 27, 2026

Apr 27, 2026·also New Laboratory of Pattern Recognition, PolyU

AutoGUI-v2: A Comprehensive Multi-Modal GUI Functionality Understanding Benchmark

Existing GUI agents can parrot actions, but AutoGUI-v2 reveals they still lack a deep understanding of GUI functionality and struggle to predict the outcomes of even simple interactions.

Hongxin Li, Hongxin Li, Xiping Wang +9

Eval Frameworks & Benchmarks Multimodal Models Tool Use & Agents

Apr 27, 2026·also New Laboratory of Pattern Recognition

GoClick: Lightweight Element Grounding Model for Autonomous GUI Interaction

You don't need billions of parameters to accurately ground GUI elements: GoClick, a 230M parameter model, matches the performance of much larger models, opening the door for on-device GUI agents.

Hongxin Li, Hongxin Li, Yuntao Chen +3

Computer Vision Multimodal Models Tool Use & Agents

Mar 11, 2026

Shuyao Shang +11Mar 11, 2026

DynVLA: Learning World Dynamics for Action Reasoning in Autonomous Driving

By forecasting compact world dynamics before taking action, DynVLA leapfrogs traditional CoT methods to achieve more informed and physically grounded autonomous driving decisions.

Shuyao Shang, Binghan Zhan, Yunfei Yan +9

Reasoning & Chain-of-Thought Robotics & Embodied AI World Models & Planning

Search

Zhaoxiang Zhang

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (6)