Jian Liang

NLPR & MAIS, Institute of Automation, Chinese Academy of Sciences

Papers on Lattice

Total citations

Topics

h-index

Research focus

Recommendation & Information Retrieval (1)RLHF & Preference Learning (1)Tool Use & Agents (1)

Frequent co-authors

Yinuo Xu (1)Shuo Lu (1)Jianjie Cheng (1)Jianjie Cheng (1)

Papers (1)

Feb 23, 2026

Tsinghua AIFeb 23, 2026·also CAS, Meituan

How to Train Your Deep Research Agent? Prompt, Reward, and Policy Optimization in Search-R1

Forget slow and steady: "Fast Thinking" prompts, combined with carefully tuned reward functions and REINFORCE, can dramatically boost the performance of RL-trained research agents.

Yinuo Xu, Shuo Lu, Jianjie Cheng +5

Recommendation & Information Retrieval RLHF & Preference Learning Tool Use & Agents