Search papers, labs, and topics across Lattice.
2
0
4
2
Aligning rewards with sub-goals and emphasizing key trajectory segments with hindsight information significantly improves multi-turn agentic RL, outperforming existing methods on complex tasks.
Constraining initial state representations with a simple Tanh activation and skip connections can significantly boost off-policy RL performance, rivaling more complex methods on continuous control tasks.