Search papers, labs, and topics across Lattice.
Stanford University
1
0
3
ExpRL outperforms traditional reinforcement learning methods by effectively rewarding intermediate reasoning steps, leading to better LLM performance on complex tasks.