Search papers, labs, and topics across Lattice.
2
0
5
11
Sustained self-improvement in LLM agents is achievable through a novel adaptive framework that outperforms traditional methods in dynamic task environments.
Intrinsic reward signals in unsupervised RL for LLMs inevitably collapse due to sharpening of the model's prior, but external rewards grounded in computational asymmetries offer a path to sustained scaling.