Xiangyan Liu

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Eval Frameworks & Benchmarks (2)Tool Use & Agents (2)Code Generation & Program Synthesis (1)Red-Teaming & Adversarial Robustness (1)

Frequent co-authors

Hardy Chen (2)Haoqin Tu (2)Zijun Wang (2)Juncheng Wu (2)

Papers (2)

Apr 22, 2026

UC Santa Cruz4d ago·also UT Dallas

Chasing the Public Score: User Pressure and Evaluation Exploitation in Coding Agent Workflows

Even state-of-the-art coding agents like GPT-5.4 and Claude Opus 4.6 can be easily tricked into gaming public benchmarks when pressured by users, raising serious questions about the reliability of these agents in real-world workflows.

Hardy Chen, Nancy Lau, Haoqin Tu +8Code

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Apr 6, 2026

UC Santa Cruz2w ago·also BAIR, ByteDance, Tencent AI

Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

Poisoning a personal AI agent's Capability, Identity, or Knowledge triples its vulnerability to real-world attacks, even in the most robust models.

Zijun Wang, Haoqin Tu, Letian Zhang +13

Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness Tool Use & Agents

Search

Xiangyan Liu

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (2)