Zhizheng Zhang

Galbot

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Robotics & Embodied AI (6)Computer Vision (3)Multimodal Models (3)World Models & Planning (3)

Frequent co-authors

He Wang (2)Jiangran Lyu (2)Xuheng Zhang (2)Jia Syuen Lim (1)

Papers (6)

Apr 2, 2026

Jia Syuen Lim +6Apr 2, 2026·also Galbot

AnchorVLA: Anchored Diffusion for Efficient End-to-End Mobile Manipulation

By cleverly anchoring diffusion sampling near plausible solutions and adding a lightweight residual correction, AnchorVLA achieves robust mobile manipulation with significantly reduced inference costs.

Jia Syuen Lim, Zhizhen Zhang, Zhizheng Zhang +4

Computer Vision Multimodal Models Robotics & Embodied AI

Mar 10, 2026

Lu Yue +6Mar 10, 2026·also Galbot, PKU

SPAN-Nav: Generalized Spatial Awareness for Versatile Vision-Language Navigation

A single spatial token, learned via occupancy prediction on a massive dataset, is surprisingly effective at injecting crucial spatial awareness into vision-language navigation, leading to state-of-the-art performance.

Lu Yue, Jiazhao Zhang, Qisheng Zhao +4

Computer Vision Multimodal Models Robotics & Embodied AI

Mar 10, 2026·also CAS, PKU, SJTU

Emerging Extrinsic Dexterity in Cluttered Scenes via Dynamics-aware Policy Learning

Forget hand-crafted heuristics: this new dynamics-aware policy learns to exploit contact forces in cluttered environments, outperforming traditional methods by 25% in simulation and showing impressive sim-to-real transfer.

Yixin Zheng, Jiangran Lyu, Mi Yan +6

Robotics & Embodied AI World Models & Planning

Mar 2, 2026

Tsinghua AIMar 2, 2026·also CUHK, Galbot

SimRecon: SimReady Compositional Scene Reconstruction from Real Videos

Achieve more realistic and physically plausible scene reconstructions from video by explicitly optimizing viewpoints for object generation and synthesizing scene graphs within a 3D simulator.

Chong Xia, Chong Xia, Kai Zhu +6

Computer Vision Robotics & Embodied AI World Models & Planning

Feb 12, 2026

Tsinghua AIFeb 12, 2026·also NVIDIA, CAS, D VAE for spatiotemporal latent encoding, Galbot +2

LDA-1B: Scaling Latent Dynamics Action Model via Universal Embodied Data Ingestion

Training a robot foundation model on 30,000 hours of heterogeneous embodied data lets it outperform prior methods by up to 48% on complex manipulation tasks and even benefit from low-quality data.

Jiangran Lyu, Xuheng Zhang, Yusen Feng +9

Data Curation & Synthetic Data Robotics & Embodied AI World Models & Planning

May 6, 2025

May 6, 2025·also Galbot

GraspVLA: a Grasping Foundation Model Pre-trained on Billion-scale Synthetic Action Data

Forget painstakingly labeled real-world data – GraspVLA proves you can train a surprisingly capable grasping foundation model on a billion frames of purely synthetic action data.

Shengliang Deng, Mi Yan, Songlin Wei +970

Data Curation & Synthetic Data Multimodal Models Robotics & Embodied AI

Search

Zhizheng Zhang

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (6)