Search papers, labs, and topics across Lattice.
2
0
5
1
LLM agents can internalize skills via in-context RL, achieving zero-shot autonomous behavior without the token overhead and retrieval noise of traditional methods.
Forget hand-tuning rollout budgets: $V_{0.5}$ dynamically allocates compute to sparse RL rollouts based on a real-time statistical test of a generalist value model's prior, slashing variance and boosting performance.