Chenxin An

Papers on Lattice

Total citations

Topics

h-index

Research focus

Eval Frameworks & Benchmarks (1)Red-Teaming & Adversarial Robustness (1)Tool Use & Agents (1)

Frequent co-authors

Bowen Ye (1)Rang Li (1)Qibin Yang (1)Yuanxin Liu (1)

Papers (1)

Apr 7, 2026

Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents

Current autonomous agent benchmarks miss nearly half of safety violations and over 10% of robustness failures because they only check final outputs, a problem Claw-Eval directly addresses.

Bowen Ye, Rang Li, Qibin Yang +8

Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness Tool Use & Agents

Search

Chenxin An

Research focus

Frequent co-authors

Papers (1)