Search papers, labs, and topics across Lattice.
2
0
5
Self-play can be dramatically improved by exploiting the "question construction path" it generates as privileged information for self-distillation, leading to 2-3x faster learning.
Agentic RL agents can learn faster and perform better by dynamically maintaining a skill bank that combines high-level task guidance with low-level step-by-step decision support.