Search papers, labs, and topics across Lattice.
1
0
2
6
Aligning rewards with sub-goals and emphasizing key trajectory segments with hindsight information significantly improves multi-turn agentic RL, outperforming existing methods on complex tasks.