Wenbo Chen

Papers on Lattice

Total citations

Topics

h-index

Research focus

Eval Frameworks & Benchmarks (2)Tool Use & Agents (2)Natural Language Processing (1)

Frequent co-authors

Xiangyi Li (2)K. Choe (1)Yiming Liu (1)Xiaokun Chen (1)

Papers (2)

Apr 6, 2026

Apple MLApr 6, 2026

ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces

LLM agents automating productivity tasks achieve only moderate success (39-64%) while exhibiting surprisingly high rates of unsafe actions (7-33%) in realistic, multi-service workflows.

Xiangyi Li, K. Choe, Yiming Liu +12

Eval Frameworks & Benchmarks Natural Language Processing Tool Use & Agents

Feb 13, 2026

BAIRFeb 13, 2026·also Independent researcher, KAUST, PKU, Provable Responsible AI and Data +5

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

LLMs can't reliably generate the very skills that boost their performance, and smaller models equipped with expert-crafted skills can rival larger, skill-less models.

Xiangyi Li, Wenbo Chen, Yimin Liu +36

Eval Frameworks & Benchmarks Tool Use & Agents

Search

Wenbo Chen

Research focus

Frequent co-authors

Papers (2)