Xia Hu

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Eval Frameworks & Benchmarks (4)Tool Use & Agents (3)Red-Teaming & Adversarial Robustness (3)Code Generation & Program Synthesis (1)

Frequent co-authors

Zhonghao Yang (2)Yu Li (2)Haoyu Luo (2)Qihan Ren (2)

Papers (5)

Apr 16, 2026

Zhonghao Yang +71w ago

Benchmarks for Trajectory Safety Evaluation and Diagnosis in OpenClaw and Codex: ATBench-Claw and ATBench-CodeX

Safety benchmarks for agent systems can be rapidly adapted to new execution environments by customizing a three-dimensional safety taxonomy, enabling continuous safety evaluation as agent capabilities evolve.

Zhonghao Yang, Yu Li, Yanxu Zhu +5

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Apr 8, 2026

2w ago

Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability

Reasoning SFT doesn't just memorize, it generalizes—but only if you train it long enough, feed it good data, and use a capable model, and even then, reasoning gains come at the cost of safety.

Qihan Ren, Peng Wang, Ruikun Cai +9

Data Curation & Synthetic Data Reasoning & Chain-of-Thought Training Efficiency & Optimization

Apr 2, 2026

Yu Li +113w ago

ATBench: A Diverse and Realistic Trajectory Benchmark for Long-Horizon Agent Safety

Current LLM safety evaluations miss the mark: ATBench reveals how risks in realistic, multi-step agent interactions emerge over time, challenging even the strongest models.

Yu Li, Haoyu Luo, Yuejin Xie +9

Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness Tool Use & Agents

Feb 16, 2026

A Trajectory-Based Safety Audit of Clawdbot (OpenClaw)

Tool-using agents like Clawdbot are surprisingly vulnerable to seemingly harmless prompts, where minor misinterpretations can quickly escalate into high-stakes tool actions.

Tianyu Chen, Tianyu Chen, Xia Hu +2

Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness Tool Use & Agents

Feb 12, 2026

DeepSight: An All-in-One LM Safety Toolkit

DeepSight offers an all-in-one open-source toolkit for LLM safety, promising to move beyond black-box evaluations and provide white-box insights into internal mechanisms.

Bo Zhang, Jiaxuan Guo, Lijun Li +15

Eval Frameworks & Benchmarks Multimodal Models Red-Teaming & Adversarial Robustness

Search

Xia Hu

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (5)