Huangxin Lin

South China University of Technology

Papers on Lattice

Total citations

Topics

h-index

Research focus

Eval Frameworks & Benchmarks (1)Tool Use & Agents (1)

Frequent co-authors

Chenxing Li (1)Chenxin Li (1)Zhengyang Tang (1)Yunlong Lin (1)

Papers (1)

Apr 30, 2026

Apr 30, 2026·also HKU, HKUST, PKU, SCUT +2

Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows

LLM agents still fail to reliably automate real-world workflows, with even the best models succeeding on only two-thirds of tasks in a new live benchmark.

Chenxing Li, Chenxin Li, Zhengyang Tang +9

Eval Frameworks & Benchmarks Tool Use & Agents

Search

Huangxin Lin

Research focus

Frequent co-authors

Papers (1)