Zhengyang Tang

Forget hand-crafting mobile benchmarks – PhoneWorld lets you automatically generate them from real-world GUI trajectories, leading to massive performance gains for phone-use agents.

Zhengyang Tang, Yuxuan Liu, X. Lai +22

Eval Frameworks & Benchmarks Tool Use & Agents World Models & Planning

May 27, 2026

Deli Huang +14May 27, 2026·also Meituan

ATLAS: All-round Testing of Long-context Abilities across Scales

Long-context LLM rankings dramatically reshuffle when evaluated across a range of context lengths and capabilities, proving that a single headline score is misleading.

Deli Huang, Cunguang Wang, Hongyin Tang +12

Eval Frameworks & Benchmarks Natural Language Processing Recommendation & Information Retrieval

Apr 30, 2026

Apr 30, 2026·also HKU, HKUST, PKU, SCUT +1

Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows

LLM agents still fail to reliably automate real-world workflows, with even the best models succeeding on only two-thirds of tasks in a new live benchmark.

Chenxing Li, Chenxin Li, Zhengyang Tang +9

Eval Frameworks & Benchmarks Tool Use & Agents

Apr 17, 2026

Apr 17, 2026·also DualverseAI

Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning

Pruning reasoning paths with a learned "STOP" token slashes compute costs and boosts accuracy in large reasoning models, outperforming existing methods.

Jiaxi Bi, Tongxu Luo, Wenyu Du +2

Inference & Quantization Reasoning & Chain-of-Thought Training Efficiency & Optimization

Apr 1, 2026

Zhengyang Tang +22Apr 1, 2026·also SJTU, Tencent AI

Do Phone-Use Agents Respect Your Privacy?

Current phone-use agents are often *too* helpful, routinely violating user privacy by filling in unnecessary personal information even when a task doesn't require it.

Zhengyang Tang, Ke Ji, Xidong Wang +20

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Tool Use & Agents

Search

Zhengyang Tang

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (9)