Hang Zhang

Papers on Lattice

Total citations

Topics

Publication activitypapers/week, last 8 weeks

Research focus

Computer Vision (6)Multimodal Models (5)Reasoning & Chain-of-Thought (3)World Models & Planning (2)

Frequent co-authors

Mu Xu (3)Xu Wang (2)Changjie Wu (2)Lingjun Zhang (2)

Papers (10)

Jun 8, 2026

Tsinghua AI2d ago

ABot-Earth 0.5: Generative 3D Earth Model

Generating realistic 3D environments from satellite imagery in under 10 minutes could revolutionize how we visualize and interact with our planet.

Ming Qian, Tianjian Ouyang, Mingchao Sun +26

Computer Vision World Models & Planning

May 27, 2026

DAMO2w ago

VeriTrip: A Verifiable Benchmark for Travel Planning Agents over Unstructured Web Corpora

Autonomous agents struggle to retain instructions when burdened with retrieving information from the open web, exposing a critical retrieval-reasoning trade-off.

Yuting Xu, Jiayi Tian, Jian Liang +4

Eval Frameworks & Benchmarks Multimodal Models Tool Use & Agents

May 26, 2026

Yuyang Tan +32w ago

TrackRef3D: Multi-View Consistent Track-then-Label for Open-World Referring Segmentation in 3D Gaussian Splatting

Achieve open-world 3D segmentation without manual annotation by decoupling object discovery from semantic grounding.

Yuyang Tan, Renhe Zhang, Hang Zhang +1

Computer Vision Natural Language Processing Robotics & Embodied AI

May 25, 2026

2w ago·also Tsinghua AI, Group, Shenzhen University of Advanced

ProSR: Process-Shaped Spatial Reasoning for Reliable Chain-of-Thought in VLMs

VLMs often fail at spatial reasoning because they either ignore visual cues or exhibit unstable reasoning, but a novel process-shaping framework can fix this.

Jiangyang Li, Cong Wan, Changjie Wu +8

Computer Vision Multimodal Models Reasoning & Chain-of-Thought

2w ago

What Makes a Medical Checker Trainable? Diagnosing Signal Collapse and Reward Hacking in Checker-Guided RAG for Biomedical QA

Seemingly strong NLI checkers can actually *hurt* medical RAG training by collapsing the RL gradient or triggering reward-hacking cascades like ultra-short answers and search avoidance.

Yuelyu Ji, Min Gu Kwak, Hang Zhang +3

Natural Language Processing Recommendation & Information Retrieval RLHF & Preference Learning

May 21, 2026

2w ago·also CAS, SJTU

4D-GSW: Kinematic-Aware Spatio-Temporal Consistent Watermarking for 4D Gaussian Splatting

Unlike naive approaches that cause flickering and visual artifacts, 4D-GSW embeds robust watermarks into dynamic 3D scenes by respecting the physics of motion.

Sifan Zhou, Hang Zhang, Yuhang Wang

Computer Vision

Mar 18, 2026

Tsinghua AIMar 18, 2026·also DAMO

Learning Transferable Temporal Primitives for Video Reasoning via Synthetic Videos

Forget real-world video datasets: training VLMs on just 7.7K synthetic videos with temporal primitives beats 165K real-world examples, unlocking surprisingly effective transfer learning for video reasoning.

Songtao Jiang, Sibo Song, Chenyi Zhou +7

Computer Vision Data Curation & Synthetic Data Multimodal Models

Mar 17, 2026

Mar 17, 2026·also PKU, UW-Madison

Dual Consensus: Escaping from Spurious Majority in Unsupervised RLVR via Two-Stage Vote Mechanism

LLMs can escape the trap of converging on popular but incorrect answers in unsupervised RLVR by temporarily "unlearning" and exploring diverse response options.

Kaixuan Du, Hang Zhang, Yukun Wang +2

Reasoning & Chain-of-Thought RLHF & Preference Learning Training Efficiency & Optimization

Mar 1, 2026

Mar 1, 2026·also Tsinghua AI

Egocentric Co-Pilot: Web-Native Smart-Glasses Agents for Assistive Egocentric AI

Smart glasses powered by web-native AI agents can now outperform commercial solutions in assistive tasks, offering a practical path to always-on, context-aware help for users navigating daily life.

Sicheng Yang, Weitong Cai, Shitong Sun +7

Computer Vision Multimodal Models Tool Use & Agents

Feb 25, 2026

Tsinghua AIFeb 25, 2026·also Group

MindDriver: Introducing Progressive Multimodal Reasoning for Autonomous Driving

Autonomous driving gets a human-like reasoning boost: MindDriver uses progressive multimodal reasoning to bridge the gap between semantic understanding and physical trajectory planning.

Lingjun Zhang, Yujian Yuan, Changjie Wu +6

Multimodal Models Reasoning & Chain-of-Thought World Models & Planning

Search

Hang Zhang

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (10)