Search papers, labs, and topics across Lattice.
2
0
5
0
Freezing most of your critic network and only training a tiny LoRA adapter can dramatically improve off-policy RL performance and stability.
Training multi-turn LLM agents just got easier: ProRL Agent offers a scalable, API-driven rollout service that streamlines RL training across diverse tasks.