Zhiheng Xi

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Tool Use & Agents (4)RLHF & Preference Learning (2)Scientific Discovery & Drug Design (2)Eval Frameworks & Benchmarks (2)

Frequent co-authors

Xuanjing Huang (4)Tao Gui (3)Shichun Liu (2)Shihan Dou (2)

Papers (5)

May 28, 2026

3w ago·also Shanghai AI Lab

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

AgentDoG 1.5 proves you can achieve GPT-5.4-level agent safety with open-source models trained on just 1k samples, slashing deployment overhead by two orders of magnitude.

Dongrui Liu, Yu Li, Zhonghao Yang +54

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness Tool Use & Agents

Apr 15, 2026

Jiahang Lin +13Apr 15, 2026·also Fudan

MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement Learning

Multi-turn reinforcement learning gets a boost: weighting trajectories by semantic similarity dramatically improves baseline estimation and agent performance in long-document visual QA.

Jiahang Lin, Kai Hu, Binghai Wang +11

Multimodal Models Recommendation & Information Retrieval Tool Use & Agents

Mar 15, 2026

Mar 15, 2026·also ByteDance, HuggingFace, McMaster University, Oxford

AI Can Learn Scientific Taste

Forget benchmarks: AI can now learn "scientific taste" and propose research ideas with higher potential impact than humans, thanks to a novel reinforcement learning approach using citation data.

Jingqi Tong, Mingzhe Li, Hangcheng Li +14

RLHF & Preference Learning Scientific Discovery & Drug Design

Mar 12, 2026

Mar 12, 2026·also Fudan

Can RL Improve Generalization of LLM Agents? An Empirical Study

RFT's impressive in-domain performance masks surprisingly weak generalization to new environments, highlighting a critical challenge for deploying LLM agents in the real world.

Zhiheng Xi, Jiazheng Zhang, Yutao Fan +8

Eval Frameworks & Benchmarks RLHF & Preference Learning Tool Use & Agents

Feb 13, 2026

Feb 13, 2026·also Fudan

SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents

GPT-5's scientific reasoning skills plummet by nearly 50% when tackling multi-step workflows, revealing a critical gap in current LLM agents' ability to orchestrate complex tool use.

Yujiong Shen, Yajie Yang, Zhiheng Xi +11

Eval Frameworks & Benchmarks Scientific Discovery & Drug Design Tool Use & Agents

Search

Zhiheng Xi

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (5)