Shichun Liu

Papers on Lattice

Total citations

Topics

Publication activitypapers/week, last 8 weeks

Research focus

Eval Frameworks & Benchmarks (3)Tool Use & Agents (3)RLHF & Preference Learning (2)Natural Language Processing (1)

Frequent co-authors

Jiahang Lin (4)Zhenhua Han (3)Xuanjing Huang (3)Shihan Dou (2)

Papers (5)

Apr 29, 2026

Tsinghua AI2d ago·also Xi'an Jiaotong-Liverpool University

CL-bench Life: Can Language Models Learn from Real-Life Context?

Today's best language models can barely make sense of your messy group chats and fragmented digital life, achieving only 19% accuracy on a new benchmark of real-world reasoning.

Shihan Dou, Yujiong Shen, Chenhao Huang +35

Eval Frameworks & Benchmarks Natural Language Processing

Apr 28, 2026

3d ago·also PKU, Shanghai Qiji Zhifeng Co

Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses

Coding agents can now evolve their own harnesses to outperform human-designed ones, thanks to a novel observability-driven approach.

Jiahang Lin, Shichun Liu, Chengjun Pan +6

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Apr 21, 2026

1w ago·also Fudan, Shanghai AI Lab, Shanghai Qiji Zhifeng Co.

EVPO: Explained Variance Policy Optimization for Adaptive Critic Utilization in LLM Post-Training

Learned critics in RLHF can actually *increase* variance and hurt performance in sparse-reward settings, but a simple explained variance metric can tell you when to ditch the critic and get better results.

Chengjun Pan, Shichun Liu, Jiahang Lin +8

RLHF & Preference Learning Training Efficiency & Optimization

Apr 15, 2026

Jiahang Lin +162w ago·also Fudan

MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement Learning

Multi-turn reinforcement learning gets a boost: weighting trajectories by semantic similarity dramatically improves baseline estimation and agent performance in long-document visual QA.

Jiahang Lin, Kai Hu, Binghai Wang +14

Multimodal Models Recommendation & Information Retrieval Tool Use & Agents

Mar 12, 2026

Can RL Improve Generalization of LLM Agents? An Empirical Study

RFT's impressive in-domain performance masks surprisingly weak generalization to new environments, highlighting a critical challenge for deploying LLM agents in the real world.

Zhiheng Xi, Jiazheng Zhang, Yutao Fan +8

Eval Frameworks & Benchmarks RLHF & Preference Learning Tool Use & Agents

Search

Shichun Liu

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (5)