Benyou Wang

Papers on Lattice

Total citations

Topics

h-index

Research focus

Tool Use & Agents (4)Eval Frameworks & Benchmarks (3)Reasoning & Chain-of-Thought (2)World Models & Planning (1)RLHF & Preference Learning (1)

Frequent co-authors

Zhengyang Tang (4)X. Lai (2)Pengyuan Lyu (2)Yiduo Guo (2)

Papers (5)

May 28, 2026

PhoneWorld: Scaling Phone-Use Agent Environments

Forget hand-crafting mobile benchmarks – PhoneWorld lets you automatically generate them from real-world GUI trajectories, leading to massive performance gains for phone-use agents.

Zhengyang Tang, Yuxuan Liu, X. Lai +22

Eval Frameworks & Benchmarks Tool Use & Agents World Models & Planning

May 25, 2026

CRPO: Character-centric Group Relative Policy Optimization for Role-aware Reasoning in Role-playing Agents

RL fine-tuning can make your role-playing agent *worse* at embodying its character, unless you carefully balance task rewards with stylistic constraints.

Yihong Tang, Kehai Chen, Liang Yue +1

Reasoning & Chain-of-Thought RLHF & Preference Learning Tool Use & Agents

Apr 30, 2026

Apr 30, 2026·also HKU, HKUST, PKU, SCUT +1

Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows

LLM agents still fail to reliably automate real-world workflows, with even the best models succeeding on only two-thirds of tasks in a new live benchmark.

Chenxin Li, Chenxing Li, Zhengyang Tang +9

Eval Frameworks & Benchmarks Tool Use & Agents

Apr 17, 2026

Apr 17, 2026·also DualverseAI

Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning

Pruning reasoning paths with a learned "STOP" token slashes compute costs and boosts accuracy in large reasoning models, outperforming existing methods.

Jiaxi Bi, Tongxu Luo, Wenyu Du +2

Inference & Quantization Reasoning & Chain-of-Thought Training Efficiency & Optimization

Apr 1, 2026

Zhengyang Tang +22Apr 1, 2026·also SJTU, Tencent AI

Do Phone-Use Agents Respect Your Privacy?

Current phone-use agents are often *too* helpful, routinely violating user privacy by filling in unnecessary personal information even when a task doesn't require it.

Zhengyang Tang, Ke Ji, Xidong Wang +20

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Tool Use & Agents

Search

Benyou Wang

Research focus

Frequent co-authors

Papers (5)