Search papers, labs, and topics across Lattice.
2
0
4
Multi-turn RL agents can learn far more effectively by explicitly monitoring and controlling uncertainty at both the token and turn levels, leading to more stable training and higher performance.
ARLArena reveals the hidden instability of agentic RL, offering a path to more reliable LLM-based agents via a novel stable policy optimization method (SAMPO).