Search papers, labs, and topics across Lattice.
NLPR & MAIS, Institute of Automation, Chinese Academy of Sciences
1
0
3
1
Forget slow and steady: "Fast Thinking" prompts, combined with carefully tuned reward functions and REINFORCE, can dramatically boost the performance of RL-trained research agents.