Bowen Ye

Peking University

Papers on Lattice

Total citations

Topics

h-index

Research focus

Eval Frameworks & Benchmarks (2)Tool Use & Agents (2)Red-Teaming & Adversarial Robustness (1)

Frequent co-authors

Rang Li (2)Chenxin Li (1)Chenxing Li (1)Zhengyang Tang (1)

Papers (2)

Apr 30, 2026

Apr 30, 2026·also HKU, HKUST, PKU, SCUT +1

Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows

LLM agents still fail to reliably automate real-world workflows, with even the best models succeeding on only two-thirds of tasks in a new live benchmark.

Chenxin Li, Chenxing Li, Zhengyang Tang +9

Eval Frameworks & Benchmarks Tool Use & Agents

Apr 7, 2026

Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents

Current autonomous agent benchmarks miss nearly half of safety violations and over 10% of robustness failures because they only check final outputs, a problem Claw-Eval directly addresses.

Bowen Ye, Rang Li, Qibin Yang +8

Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness Tool Use & Agents

Search

Bowen Ye

Research focus

Frequent co-authors

Papers (2)